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ABSTRACT 



This report examines grading practices, the users of 
qrades and the influence of grades on the student., faculty, 
administration and society. The author also indicates how grading 
practices can be and are being altered to provide an educational tool 
that accurately reflects the many dimensions of student performance. 
It is noted that grades seem unnecessary for many of the 
administrative purposes within an institution, other :hiu cs .. 
indicator that a certain course has been passed by a scjdont. 
Furthermore, selection for academic awards, honor programs or .special 
classes could be based on faculty nominations supplemented by 
evaluative information provided by the faculty. It is concluded that 
more varied and effective grading procedures are available; however, 
that they are seldom employed may be caused by the uncertainty over 
what is really wanted of grades,, In light of this, the components and 
structure of grades need closer scrutiny so that the issues raised by 
the grading process--involving, as they do, all levels of 
sooiety--can be dealt with, {if V K ) 
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FOREWORD 



In this comprehensive review of the literature, Jonathan Warren examines grading 
practices, the uses of grades and the influence of grades on the student, faculty, 
administration and society. He also notes the relationship of grades to the social structure 
and cites a need for a clear definition of the purposes of grades. The author, a Research 
Psychologist with the Educational Testing Service, indicates how grading practices can 
be and are being altered to provide an cd'jcational tool that accurately reflects the many 
dimensions of student performance. 

Tlie uliiili Ln a serirs of reports on various aspects of higher education, this paper 
represents one of seve al kinds of Clearinghouse publications. Others include short 
reviews, bibliographies, r nd compendia based on recent significant documents found both 
in and outside the ERIC collection, In addition, the current research literature of higher 
education is abstracted and indexed for publication in the U.S. Office of Education’s 
monthly volume, Research in Education, Readers who wish to order ERIC documents 
cited in the bibliography should write to the ERIC Document Reproduction Service, 
Leasco Information Products, Inc,, 4827 Rugby Avenue, Bethesda, Maryland 20014, 
When ordering, please specify the ERIC document (ED) number. Payment for microfiche 
(MF) or hard/photo copies (HC) must accompany orders of less than SI 0.00. All o/deis 
must be in writing. 



Carl J. Lange, Director 

ERIC Clearinghouse on Higher Education 



March 1971 
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I, INTRODUCTION 



In 1964, a report of a conference on grades in higher 
education emphasized that grading was perennially over- 
looked when the processes of higher education were 
considered, The conferees pinpointed some neglected but 
important questions about grading, such as (leaf, 1964): 

• Do we know the effects of grades on the educational 
process? 

» Can grades be ju$!ified as incentives to learning? 

• What aspects of student behavior are reflected by 
grades? 

• What function do grades serve in selection procedures? 

• Can alternative devices serve the functions of grades and 
eliminate their shortcomings? 

Grading has slowly emerged from an area of neglect to 
become a widely discussed, controversial topic. But focus 
on grades, though intense, has been haphazard, in that only 
one or tw o of the issues raised ir: 1964 have been examined 
while other, more inclusive questions have been almost 
totally ignored. 

More than one reader interpreted 'he review as a biased 
appeal for the abandonment of grades. The review is, of 
course, biased, but the favored position is that grading 
practices should be improved to make them serve their 
various intended purposes more effectively — not that they 
should be abandoned, This position implies that purposes 
should be identified and the effectiveness of grading 
procedures in serving those purposes should be compared 
with that of alternative procedures. The value of accom- 
plishing the purposes should .also be weighed agrnst the 
cost of grading in terms of the expenditure of educational 
resources and of whatever undesirable side effects can be 
demonstrated. Desirable side effects should be weighed in 
favor of grading, The desirability or undesirability of side 
effects, however, is itself likely to be a disputed issue and 
would also merit study. 

An in Dial bias to the effect that grading procedures 
could be improved has been immensely strengthened as the 
literature was read and evaluated, Present grading pro- 
cedures are monolithic it the iame time that higher 
educalion is increasing in diversity and complexity. Tech- 
niques of information processing and management, Incor- 
porating the baric functions of grades, are growing in 
power, subtlety, and refinement while grading processes 
remain it a standstill, The biases that pervade the following 
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pages, therefore, are that present grading practices are 
inadequate to their intended tasks; that possibilities for 
improvement are enormous and should be pursued; that 
purposes as well as practices inquire examination; and that 
the social as wt}\ as educational effects of grading are too 
important to be neglected any 'onger. 

Of approximately 200 articles, papers, and reports about 
grades appearing from i 065 to 1970, about one fourth 
considered the form of grades, especially whether Pass-Tail 
should replace A through F, Another one-fourth compered 
the use of undergraduate grades to predict grades in 
graduate and professional schools. Therefore, one half of 
the recent literature on grading was occupied with only two 
limited aspects of grade s— their external form r.nd their 
predictive relationship to later grade s. The remaining half of 
the literature ruminated over a variety of top ic*s - variability 
in grading standards, disadvantages of grades, effects of 
grades on students, use of grades to predict occupational 
success, determinants of grades, and the social effects of 
grades-none of which appeared in as many as 10 percent r>f 
the total publications, (Excluded from this count is the 
large number of articles on the prediction of undergraduate 
g r ade$.) 

These reports, in spite of their variety, leave large gaps in 
our know ledge about grades and grading. They lead to only 
a few general statements that can be made with much 
confidence: students approve of Pass-Hail griding, but when 
offered a Pass-Fail option, they often don’t elect the option 
to take courses they otherwise would not have taken; deans 
and others concerned with admission to graduate and 
professional schools disapprove of Pass-Fail grading in 
undergraduate colleges; undergraduate grades predict first- 
year graduate and professional school grades about as well 
as they have for years but not very well most of the time, 
occasionally quite well, occasionally not at all. 

These results do not constitute an impressive advance ir* 
knowledge about an impoitant, ubiquitous process in 
higher education. Still neglected, except in occasional 
speculative inusings, are questions about tfu purposes of 
grade*. For example; Are the purposes worthwhile? If so, 
are they well served? Are the frequent criticisms of grades 
justified? If so, can ways be found to serve the purposes of 
grades without the deficiencies of present procedures? 
While experiment with Pass-Fail procedures and prediction 
studies touch on parts cf these questions, the basic issues 
remain obscured. 
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II. GRADING EFFECTIVENESS 



Grades < an be defined as sets of symbols that represent a 
level of ac .dermc achievement indicated by some form of 
evaluation, fhe\r purpose is to condense the results of 
evaluation into a form simple enough for * continuing and 
cumulative record of student accomplishment to be main- 
tained. The grading process is therefore not the evaluation 
process, but follows it. The effectiveness of a grading 
system can be examined with respect to (1) the fidelity 
with which it encode* evaluation results, (2) the ease with 
which it lends itself to recordkeeping, and H) 
adequacy of the information it conveys for the users of 
grades, 

Fidelity 

The translation of evaluation results into symbols is the 
mo t critical process in a grading system. Unless enough 
use ul information is encoded by the grade symbol, 
effective functioning in other respects is almost worthless. 
As long as grades consist of a single symbol assigned for 
each course, they can convey information only on a single 
dimension, although several different kinds of performance 
might be observed in the evaluations on which the grade is 
bay.d. 

The term mo s* commonly applied to the complex 
dimension grades are intended to measure is acidemic 
achievement. Yet academic achievement is itself defined 
only in terms of compos;; .v of course grades. It has no 
independent definition against which the validity of course 
grades can be checked. 

The poor fidelity of grade symbols is largely responsible 
for the sparseness of the meaning in academic achievement 
(Ericksen, 1966; Trow, 1968). The grading process begins 
with an individual instructor evaluating a variety of student 
performances -responses to test questions; the quality of 
thinking, understanding, grasp of factual detail, integrative 
ability, and fluency of expression found in written papers, 
the evidence of student preparation, undemanding, and 
interest revealed in class div.ussicns; and whatever other 
kinds of evidence the instructor considers relevant to his 
Q.’finifion of achievement in that course (a definition that is 
probably unspecified). Indicators of all these components 
of achievement are then weighted and combined into a 
single scale often inappropriately because of differences in 
the variances of the indicators (I^acey, 1963). The com- 
posite measure is reasonably reliable, with respect both to 
internal oonshlcn'y and tesirctest reliability over periods 
shone; than i year. Grades may therefore be accurate in 
reflecting ^verforniance on some undefined dimension of 
acadenic achievement. B»il their fidelity is poor in that 
they transmit only a small oart of the information in t be 
evaluilions that led to the gr*de while leaving the infor- 
mation they do transmit difficult to interpret. 



o 




Reco id keeping 

Recordkeeping is facilitated by dividing achievement 
into some arbitrary number of segments. The number can 
range from two to more than 100 (some military institu- 
tions using a 400-point scale). Dresse! and Nelson (1961) 
noted the hardiness of the five-category scheme for 
segmenting the achievement continuum, pointing out that 
departures toward more or fewer categories ultimately 
revert to five. Two- and thiee-category schemes are either 
modified by pluses and minuses or otherwise subdivided, 
and schemes of more than five tend to have categories 
merged. 

The division of the achievement continuum into seg- 
ments or grade categories and the location on the con- 
tinuum of the boundaries between categories and of 
instances of student performance are problems that have 
absorbed much attention. Comparisons of departments 
within institutions and of faculty members within depart- 
ments as to their choices of location for the boundaries 
between grade categories are common (e.g., Juola, 1968). 
Attempts to fird some ica^onably stable common standard 
on which to anchor the achievement dimension are less 
frequent but slit] common (Anderhalter, 1962; Berdie, 
>1965; Fricke, 1965; Grant, 1956). Currently the number- 
of-categories issue— whether five or twv-is being vigorously 
debated. 

The convenience of a limited number of categories, or 
the difficulty in using more than about five, probably 
accounts for the strong tendency noted by Dresse! and 
Nelson (1961) to reduce laiger numbers of categories 
dtspite demonstrations that using fewer categories neces- 
sarily decreases grading accuracy (Fbel, 1969). The com- 
mon practice of placing the boundaries between grades at 
points on test score distributions (or on distributions of 
accumulated paints in a course) where breaks occur 
between adjacent scores is another accommodation to 
convenience. No justification can be found for assuming 
that gaps in score distributions have any relationship at all 
to what arc presumed to be commonly accepted categories 
of performance. For convenience, something that occurs by 
happenstance is used to define the boundaries between 
grade categories 

Kirby (1962) po’nted out that at one “ralhcr large 
institution of good reputation'* discontinuities at the 
boundaries between grade categories can be expected to 
cause 42 percent of the students to gain or lose, relative to 
their precise position on the achievement dimension, one 
gradqolnt or more in 15 units of claw. One percent of the 
students will gain or lose five gudepoints or 0.33 points In 
their gradepoint average for that semester solely becau :e ol 
errors due to discontinuity between grede categories. 
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Purposes of grade information 

The users of grade records seldom seem concerned about 
the nature of ihe information conveyed. Yet the adequacy 
of the information encoded in grades A dependent on the 
purposes for which grade records are used. Different 
purposes might reasonably be assumed to call for different 
kinds of information, and purposes therefore merit exam- 
ination, Is the substantial expense borne by an institution 
to maintain grade records justified to provide personnel 
evaluation and selection services to other agencies 
(Goodman, 1964; Jencks and Riesman 1968)? What are 
the interns! purposes for which grade records are main 
tai led? Could several simpler and collectively less expensive 
procedure ■' serve the same purposes? F^r example, should 
procedures for advising students about ccurs-i selection and 
procedures for determining eligibility for extracurricular 
activities depend on the same set of records? Would the 
sepantion of records for different functions improve the 
effectiveness of each? 

Eligibility for veterans’ benefits, retention of scholarship 
awards, and draft status have depended on a student 
maintaining satisfactory academic standing. The kind of 
information required for these purposes differs substan- 
tially from that required by a graduate department selecting 
10 sludents from among 80 applicants. V et both kinds of 
purposes now depend on Ihe same source of information, 
even though the graduate department may supplement the 
overall gradepoint average with other information. The 
primary information in grade records is still some kind of 
weighted average of the onc-dimensiona! course grades. 
Fach grade is a composite of a number of varied kinds of 
judgments, each composite differing in some unknown way 
from the others. Then these poorly defined composites are 
averaged into something that can only represent whatever 
does not distinguish a good memory from depth of 
understanding, or sensitivity to professors’ preferences from 
imaginative synthesizing of dispante elements, or problem- 
solving ability from expository fluency. 

The information in gradepoint averages may or may not 
be ;rdequa1e for Hs purposes, but it can be no better than 
the information encoded in ihe original grades and is, in 
fact, substantially less than the tolal information in the 
collection of course grades. Deans and admissions officers 
who object to two-level, Pass-Fail grading on the ground 
that they need ;he grealer amount of information in 
five-level, A through F grading syslems arc partly deluding 
themselves, for the additional information in five-level as 
opposed to two-level grading is almost uninlerpretabfc. A 
greater number of grade categories does carry more 
informafion in a technical sense; differentiation ansong 
students is more accurale with five grade levels than with 
two (EbcL 1969). But interpreting tbit Increment in 
in formal ion -reoove ring from it the meaning that was in the 
original evaluation is essentially impossible. The meaning Is 
lost in translating a variety of evahm ions into one 
dimension of achievement and then averaging performances 
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on a number uf these essentially undefined but probably 
quite different dimensions into a single index, Conse 
quently, gradepoint averages are reliable measures of an 
undefined entity. 

What the achievement dimension repre.-ents is ignored in 
most of rhe controve sy over the desired characteristics of 
the scale used to indicate different regions on the dimen- 
sion. However accurately different levels of achievement 
can be located on a scale, the symbol assigned to a point on 
the scale can include no more information than can be 
represented by that single dimension The processes by 
which dissimilar kinds of performance are collapsed onto a 
single dimension, and even what those dissimik kinds of 
performance may be, are lne» in the concern over how 
many categories to break the dimension into and how to 
assign students to those categories, The fidelity-not the 
accuracy-of the translation of evaluations into grade 
symbols is therefore one of the most critical issues in 
grading, and one of the most neglected, 



Evaluation and grading as distinct processes 

The prer ding discussion distinguished grading from 
evaluation. Grades are the symbols that formally indicate a 
student’s general level of academic performance. Evaluation 
consists of the variety of processes- reading papers, giving 
quizzes, kb excicises, and exams, asking questions, listening 
to discussions, observing the quality of student questions - 
by which faculty members arrive at judgments about 
student accomplishment. 

The failure to distinguish between evaluation and grad- 
ing, or the assumption that the two processes are one, 
frequently leads to fruitless debate. Faculty members have 
spoken against reducing the number of categories in a 
grading system because they believe evaluation of student 
pertormance would be hampered. Yet facuily evaluation of 
student performance and the communication of its results 
to sludents can be carried out with no reference whatever 
to grades. The institutional demand that grades be assigned 
may force some instructors to evalutle sludents even if 
they see no need to do so. Bui grades ir, no way preclude 
evaluation, whatever their form 

Evaluation used primarily to improve student per- 
formance by serving a feedback funclion, by informing 
studenls of their progress while performance is still fluid, 
still being developed, has been termed /cvTuaf hr evaluation. 
Summatiie, or term nal evaluation, in contrast, is intended 
to provide an appraisal of the final level of performance at 
the end of some period of Instruction or at some poinl of 
discontinuity, more or less arbitrarily defined, as when a 
student has completed IS weeks of insliucfion (Striven, 
1967). Grading is usually associated with summalive evalua- 
tion, whkh often tequUes a different set of p^ccdures to 
be most effective ihan does formative evaluation (Bloom, 
1968; Husck, 1969). Summalive evaluation and grading 
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may also be carried out by an agency other than the one 
jirovjdirg the instruction. Formative evaluation, being part 



of the instructional process, must stay within the control of 
the instructor. 



III. GRADING SYSTEMS 



Pass-Fail grading 

The primary controversy o/er grading present is 
whether multilevel grading systems, usually the five-level A 
through F system, should be replaced with a two-’?vel 
Pass Fail system Distinctions among several levels of 
acceptable or adequate performance and one failing level 
would be replaced with the single distinction between 
acceptable and unacceptable performance. 

Although some form of Pass-Fail grading has become 
common in the last 5 years (Benson, *969; Buchman. 1970; 
Burwen, 1970; v^uann, 1970), only a handful of colleges 
have put their entire grading system on a Pass-Fail basis. 
The typkal procedure is to offer students an option of 
taking a limited number of Pass-Fail courses with the rest 
graded on the standard A through F basis. Dartmouth’s 
option procedure i* representative of most. Students were 
permitted to take one Pass-Fail course per term provided 
the couiv* was not in the student’s major field (Feldmesser, 
1969). Other colleges limit the option to seniors, upper- 
classmen, or those with gradepoint averages above some 
minimum. Courses in the student’s major field are almost 
always excluded, while courses needed to satisfy foieign 
language or mathematics requiremerts sometimes are a ltd 
sometime.* are not excluded. Mere th; n one Pass-Fail course 
per term is seldom permitted. With these limitations, few 
students complete college with more than 10 percent of 
their grades Pass-Fail (Pass-Fail Study Committee, 1969). 
The consequences of Pass-Fail griding, undertaken with 
great trepidation and concern, have been trivial. 

The most common reason for adopting a Pass-Fail 
option is to encourage students to take courses they 
otherwise might not risk for fear of jeopardizing iheir 
gradepoint average (Benson, 1969; Feldmesser, 1969, 
Freeman, 1969; Johanvson, Rossmann, and Sanded, 1970; 
Melville and Stamm, 1967; Milton, 1967; Moiishima and 
Micek, 1970, Quann, 1970; Sgan, 1969; Stallings, Smock, 
and Leslie, J968; Wharton, 1969). Students were expected 
to feel freer to c xpiore u nknown areas and to try courses in 
which t he y feel some insecurity. Hovever, they have not 
used the Pass-F.d option for this purpose to any great 
extent. 

At each of five institutions (Dartmouth, Princeton, 
Wellesley, the University of Michigan, and the University of 
Washington) where students were surveyed aftei initiation 
of t Pass-Fail option, roughly 75 to $5 percent of »he 
students who elected to take a Pass Fail course said they 
would have taken the course anyway (Cromer, 1969; 
Feldmesser, 1969; Karlins, 1969, Morishima and Micek, 
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1970). At Brandeis the pattern of course enrollment 
showed little change after a Pass-Fail option was instituted 
(Sgan, 1969). Some students apparently do take courses 
they would not take other than under a Pass-Fail option, 
but the numbt r is not large . 

Another common reason for adopting Pass-Fai! giadin^ 
procedures is to reduce student anxiety over grades. When 
asked, students have reported feeling less anxious in 
Pass-Fail courses (Cromer, 1969; Karlins, 1 969; Melville and 
Stamm, 1967). In this respect the Pass-Fail option seems 
successful, although retrospective reports about emotional 
responses are typically not reliable indicators of actual 
responses. More dependable are the student reports that say 
overwhelmingly (but not unanimously) they like Pass-Fail 
grading. Students when surveyed inevitably urge continua- 
tion and expansion of limited Pass-Fail option procedures 
(Cromer, 1969; Ericksen, 1967; Karlins, 1969; Melville and 
Stamm, 1967; Milton, 1967; Morishima and Micek, 1970; 
Priest, 1969). 

Other reasons given for Pass-Fail grading are to shift 
students’ efforts from giade-getling to learning (Benson, 
1969; Committee on Educational Policy, 1970; Feldmesser, 
1969; Milton, 1967; Qusnn, 1970; Sgan, 1969); to let the 
teacher function as mentor rather than judge (Committee 
on Educational Policy, 1970); to avoid the preiense that 
students are evaluated more accurately than is the- case 
(Benson, 1969); and to give students greyer cor trol over 
the allocation of study time (Milton, 1967). While these 
seem plausible expectation to hold for Pass-Fail grading, 
only the last can be supported by evidence (Erkksen, 1967; 
Feldmesser, 1969; Freeman, 1969; Karlins, 1969; 
Morishima and Micek, 1970). 

The tendency of students to slight couises graded 
Pass-Fail in order to concentrate on other courses has been 
offered as a defect in Pass-Fail options. Yet the view r that 
student control over their distribution of elTort is desirable 
seems more defensible (Milton, 1967). A course may have a 
particular interest or be partkulaily important to a 
student’s major Field or be more difficult for him than 
others. These all seem good reasons for stud ents to adjust 
thrir effort unevenly across different courses. Elton ( 1 968) 
and Feldmesser (1969) have used similar arguments to 
propose schemes for vaiiable weighting of course grades 
with the students choosing the weights to be assigned. 

One might speculate that what some faculty members 
•object to is not the differential allocation of effoit to 
dilTerent courses as much as the possibility that students 
may go through college, or at least through seme courses, 
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without expending an acceptable amount of effort. In- 
structors who use grades as a device for coercing students 
into kinds of behavior the instructor considers desirable 
(Mayhew, 1969) or who adjust their grades according to the 
amount of effoit the students are believed to have 
expended (Axelrod, 1964) might be expected to feel 
chagrined when students manage to learn without going 
through the tasks set by the instructor. This view cannot be 
advanced on the basis of clearcut evidence; Us plausibility 
can only be inferred from unsystematic observations and 
experience and the general expectation that people with 
hostile, punitive proclivities can be found among college 
professors as well as elsewhere. 

The major objection to Pass-Fail grading is the problem 
of graduate and professional school admission. However, 
that objection is serious only if a substantial part of a 
student's record consists of Pass Fail grades, something that 
occurs in only a few colleges. At the University of 
California at Santa Cruz, one of the few institutions where 
most grades are Tass-Fail, more than half of ’he graduate 
school aspirants among the 1969 graduates reported they 
encountered no problems in gaining admission. Nine per- 
cent did report problems and another 35 percent were not 
sure (Pitcher and Bosler, 1970). Although the Pass-Fail 
grading system had affected graduate school selection to 
some extent, most students who applied were admitted, 
although not always to the school that was their first 
choice. Peihcps more serious than not attending a first- 
choice graduate school was the loss of fellowships as a 
consequence of the Pass Fail transcript. This did occur but 
its frequency is not known. 

Whitman College reverted to the customary A through F 
system after 15 years of Pass Fail grading primarily because 
of difficulties encountered by student transfers and by 
graduates applying to graduate schools (Perry, 1968). Yet 
perhaps because of the growth in concern over grading since 
Whitman's abandonment of Pass-Fail, the difficulties en- 
countered by Santa Cruz graduates were not considered 
great enough to Induce a similar action there (Committee 
on Educational Policy, 1970). The prestige of the under* 
graduate institution may also affect graduate admission, 
although Whitman's difficulties occurred in spile of a strong 
academic reputation. 

Pass/No Record grading 

A system similar to Pass-Fail grading has been proposed 
in which failure results in removal of the course from a 
student's record, The primary argument for Pass/No Record 
grading is that failure to achieve an adequate level of 
performance In a course shouH not result in a penalty to 
the student. He should simply not be given credit for the 
course, Brown University has Instituted such a procedure 
and Slanforc is considering a recommendation to do w 
(Laid, 19' , 0). Several other institutions have either tried 
this system on in experimental basis or a;e considering it 
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(Christ in, 1970; Goldstein and TiDcer. 1969, Smith, J969). 

A number of junior colleges have instituted some form 
of this “nonpunitive” grading. All D's and F’s are replaced 
by W's, which indicate only that the student should not be 
given credit for completing the course. Recordkeeping 
practices differ. At Santa Fe Junior College in Florida the 
failed course does not appear on the student's record 
(Fordyce, 1969-70), making it a Pass/No Record system 
(“Pass-Erase” in the Stanford terminology). Other junior 
colleges record the “No Credit" grade, which means that a 
record of attempting a course and not passing it is 
maintained (Brooks, 1968; Smith, 1969); however, the 
course is not counted when a student's gradepoint average 
is computed, The failed course can be repeated as often as 
the student chooses until he passes. 

However, many colleges and universities, and probably 
moft, require a minimum gradepoint average for a student 
io be readmitted the following academic year. This means 
that standards are raised for the second academic term 
relative to the amount a student has fallen below the 
acceptable first-term level, Making the hurdles higher as the 
performance level drops seems an unreasonable procedure. 
Pass/No Credit grading avoids that situation. 

Some faculty membei s object to Pass/No Record grading 
because a student could stretch out indefinitely the lime he 
spends accumulating enough units to graduate. At colleges 
having a student body homogeneous in previous prepara- 
tion and aptitude this might be a valid objection. On the 
other hand, though, in a homogeneous student body the 
number of students stretching their time in college inordi- 
nately v'ould probably be small. The basic argument is 
whether students taking courses in which they can fail 
without penalty would constitute an inefficient use of the 
institution's resources No one knows. 

At junior colleges, where substantial proportions of 
entering students have not been successful in previous 
educational settings, early demands for a uniform level of 
performance seem paiticularly questionable. Many junior 
college entrants need a period of adaptation to college, and 
the Pass/No Credit system abows this to them. At more 
selective institutions like Stanford and Brown, the same 
opportunity for adaptation might be desireble if hetero- 
geneity In the student body were to be increased. In any 
case, though, whether students are to be permitted to move 
through college at varying rates is a question to be decided 
on its own merits, Thai decision should then enler into 
consideration of whether or not to record unsatisfactory 
performance in a course. 

Marshall (1968) has described in some detail the process 
by which the faculty in a department of a medical scliool 
'cached a decision about giades. After extended discussion 
of various procedures, one faculty member observed that in 
me particular situation In their department the most useful 
distinction to be made w ith respect to student performance 
was between students who had clearly mastered the content 
of a course and those about whom there was some 
question. The most sensible grading scheme, and the one 
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that was then adopted, was Clear/Not Clear with reject to 
evidence of course mastery. In most graduate and profes- 
sional schools and in selective undergraduate colleges, the 
number of students who are clearly incapable of mastery at 
a' acceptable level of some program of courses is likely to 
be quite small. Whether or not Clear/Not Clear would be 
the most appropriate grading scheme at all such institu- 
tions, as it seemed to be at the University of California 
Medical Center, consideration of the nature a nd purposes of 
grades at that institution should underlie the selection of a 
grading system. 



Descriptive grading 

Descriptive grading, which historically preceded the 
various symbolic grading scales being used (Smallwood, 
1935), consiMs of written comments that describe tire 
student’s performance. It is not based directly on any scale 
of academic performance except the implicit and intuitive 
scales that underlie a professor’s judgments. Both the 
nature and the level of the performance are described by 
the instructor, and both may vary within a single class or 
course. This specification of the natu:e of the performance 
th at leads to a judgment of excelJent, good, or peer is the 
major distinction between descriptive grading and symbolic 
grading. With symbolic grading scales, differences that 
commonly exist in the nature of the performance evaluated 
a:'. lost. 

The most serious drawbacks to descriptive grading are 
tie time required for faculty members to wiite the 
descriptive comments and the difficulty i.i making quick 
and simple comparisons of performance descriptions. Form- 
ing a judgment about which tetter grade to assign a student 
is often, 1 ho ugh not always, easier and less time-consuming 
than writing a descriptive comment about a student's 
performance. Comparing the capabilities ol two students is 
also easier if each is described by a numerical gradepoint 
average instead of a set of instructor comments. But while 
the process of comparing is .Impler with gradepoint 
averages, the information on whl h the comparisons are 
based is probably far greater wfh descriptive comments. 
Even a terse, relatively barren comment, such as ‘’good 
student.” that opponents of descriptive grading point to as 
illustrat ive of it i weakness, is at least as informative as “B ” 

The most detailed accounts of long-term experiences 
with descriptive grading ire those of Sarah Lawrence 
College (Muiphy and Raushenbush, 1960) and the micro' 
biology department of the University of California Medical 
Center (Marshall, 1968). The University of California a’ 
Santa Cruz has used a combination of Pass-FaiJ and 
descriptive grading since ri opening in 1965 (Committee on 
Educational Policy. 1970), and a few other colleges, usually 
small, selective, liberal arts colleges such as Bennington and 
Goddard, have used descriptive griding or some comblna- 
iion of description and symbobc grades. 



The strengths and weaknesses of descriptive grading are 
closely associated with the purposes for which grades are 
intended. Its major strength is in specifying the dimensions 
of the evaluated performance. If feedback to students can 
be accepted as a grading function, descriptive grading can 
be superior to other forms. It may not be superior if the 
descriptions are inaccurate, misleading, or uninformative, 
but its potential for conveying information is far greater 
than that o: symbolic grading. 

The weakest aspect r f descriptive grading is its cumber- 
someness for selection and other administrative p octsses 
involving large numbers o5’ students. Yet modern ir for- 
mation storage and retrieval technique? 2 pp?c: shlc to 
manage descriptive grading as effectively as symbolic 
grading has been managed in the post . Recording and 
storing prose descriptions of student peifotmarr* seem 
feasible, from the svued desorptions, reports of Judent 
performance could be compiled to summarize on r > those 
elements relevant to the purposes for which ihe :e -rt is 
intended. Selection for employment and sclecior, for 
graduate education are two purposes that might be ex- 
pected to rely on evaluations of different kinds of student 
performance and therefore would require different reports. 

Other grading systems 

The grading procedure a; the University of Surrey 
combines level of student performance with couise diffi- 
culty, difficulty being determined by both course level end 
intensity (Elion, 1968). Student performance is judged in 
conventional ways from examinations, essays, projects, and 
other course work. The student's grade is then the product 
of his level of performance and the difficulty of ’he course, 

A joint student-faculty committee at the University of 
California School of Law in Berkeley, after extensive 
interviews with students, faculty, alumni, and employers, 
recommended changes to the existing procedures to give 
them more flexibility and make them more informative 
(Committee on Grading, 1970). The existing system was a 
three-point scale (Top-Middle- Low) with 10 percent of the 
students in any class assigned to each of the extreme 
categories and 80 percent assigned to the middle category. 
Faculty and students objected to the rigid proportions in 
which grade* we** to be assigned and to the lack of 
differentiation within <hc middle category. 

The Committee’s recommendation was to us? three 
levels of passing grades-Excellent, Very Good, and Quali- 
fied. Variable proportions of students can be assigned to 
each lezel, depending on the instructor’s judgment of the 
overall performance of the claw. From 15 to 20 percent of 
the students in a class would be graded Excellent, for 
exampl.v and from 30 to 35 percent Very Good. Ihe rest 
of the students w ho reach an adequate level of performance 
would pass as Qualified. Students who did not reach an 
acceptable level would receive an Incomplete, to be 
remove 1 by repeatirg some or all of the coiuse. 



The proposed law school procedure is similar to the* 
ABCX procedure of some junior colleges, where the X 
indicates inability to undertake the next course in a 
sequence but does *ict appear in the student’s record, and 
can be removed by repeating the course if the student 
chooses. The 1 • » school system differs by specifying a 
range of proper ions for each grad? and in requiring that 
inadequate performance be brought to a satisfactory level. 

A grading procedure that would allow for diversity in 
the kinds of student performance evaluated, and make ihe 
various kinds of peiformance explicit, has been proposed 
by Elbow (1969). He suggests that a list be provided of 
those a.pects of student performance considered important 
by the faculty. The students in any particular course would 
then be graded w ith respect to those qualities listed that the 
instruct r in that course considered pertinent. 

The qualities ral?d would almost certainly differ across 
courses, and they could also differ within a class. Except in 
large classes, instructors commonly have different kinds of 
information about different students. Qualities observed in 
certain students and graded by the instructor may be left 
ungraded for oihc' students in the same class because no 
oc. on for their observation occurred. 

The vari ib les graded may also differ at various levels of 
performance. An instructor might consider diligence im- 
portant in a low performance rludent but not relevant to a 
high-performing student. Since creative integration of d is- 
parJ.e elements into an effective construct may only appear 
among high-performirg students, rating all students on that 
dimension wouM be unnecessary. 

This procedure combines elements of descriptive and 
symbolic grading. The descriptive phrases or dimensions of 
performance arc provided in advance, limiting the instruc- 
tor’s freedom of invention. Systematic determination of 
those qualities most often considered important by the 
faculty, however, could make this an unimportant consider- 
ation. For these who prefer the present one-dimensional 
grading scheme, on? of Ihe dinjensions offered for jatjng 
might be general rcadermc performance. The number of 
levels of each riling could be two or three or more, but a 
limited number seems preferable. Recordkeeping and re- 
porting of grades would be somewhat more cumbersome 
than with single grades but would not be a serious problem. 

Hoyt (1966, 1968) made a similar proposal in recom- 
mending that grading be multidimensional ?nd reported in 
the form of a profile. The primary advantage of such a 
scheme is in specifying the nature of the performance 
eval aated and intended to be reflected by a grade. Averages 
would also l v; profiled to refer to specified kinds of 
performance. Persons using gjades for selection would be 
able to make their own judgments about the kind of 
performance they consider Important and would no longer 
have to assume that the evaluator had the same views of 
w hat constitutes desirable performance as the selector. 

Scouts ’or professional football teams use a seneme such 
as this for grading college players. Six or eight dimensions 
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are provided on a lorni for grading players at each position. 
Some dimensions are specific to a jingle position v/hile 
others are common to several positions, but distinctions 
among different kinds of performance are explicit. Offen- 
sive backs, for example, are graded separately on running 
with elusiveness and running with power but are not graded 
on receiving long passes, Offensive ends are graded sepa- 
rately on receiving lorg passes and receiving short passes 
but are not graded on power running. Both backs and ends 
are graded on “hands”-the sureness with which they 
handle a football ( San Francisco Sunday Examiner & 
Chronicle , 1971). Academic performance is surely more 
complex than football and selection to graduate or profes- 
sional school more important for society than selection of 
candidates for a job as a football player. But football 
selection is carried out with far greater discrimination, 

A grading alternative that slioukl no: be ignored is 
abolishing all grades. This does not mean instructors would 
not evaluate student performance in whatever ways are 
appropriate o that the results of those evaluations would 
not be commi nicated to the students. But no formal record 
woukJ be mace of the level of a student’s performance in a 
class. Records would only indicate satisfactory completion 
of a course. 

Only one of the major purposes for which grades are 
intended would be jeopardized by their abolition. Ollier 
institutions would have to find other criteria for selection. 
In view of the ciratic performance of grades in selection, 
however, this seems not to be a serious consequence. The 
greater looseness in selection procedures for graduate and 
professional training would probably complicate the tasks 
of admissions officers, but the social benefits of the 
increased heterogeneity of the population entering graduate 
and professional training might well justify the admissions 
officers’ problems. A distinct benefit would be the forcing 
of graduate schools to give closer avtention to the selection 
process and its purposes. 

The motivational and informational functions intended 
for grades are questionably served if at all. The limited 
evidence available suggests that their motivational effects 
vary with different kinds of students in different kinds of 
situations ^nd may not be great in comparison with other 
motivating forces, such as the desire to perform well. The 
informational function of grades is negligible as far as 
students are concerned if the results of the evaluation 
process are effectively communicated to them. The institu- 
tion has little need for iecords of student performance 
level. The courses a student has completed satisfactorily are 
enough. Awarding academic honors and financial aid (if 
financial aid is to be based on level of performance) can be 
based on faculty nominations or other derivatives of faculty 
evaluations that would not require grades for aU students. 

In short, the abolition of grades is not an uirihinkabb 
alternative. It may turn out not to be desirable, depending 
on circumstances and the desirability of the purposes, but it 
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merits consideration, if grades do serve a useful function 
that justifies their cost to the institution, that fact should 
be established more surely than it has been. 

A few colleges have apparently functioned well without 
grades in the past and continue to do so. They tend to be 
small, selective liberal arts colleges, medical schools, or 



small experimental programs within a college. But the 
experiences of these institutions show that in some circum* 
stances grade* can be abolished without undesirable conse- 
quences. Education without grades presents problems, but 
these problems may be far less serious and more amenable 
to solution than the problems grades contribute. 



IV. PURPOSES OF GRADES 



Much of the preceding discussion, but particularly the 
distinction between formative and summative evaluation, 
suggests that evaluation and grading procedures have several 
purposes and should vary to accommodate different pur- 
poses. Yet the literature on grading almost totally igi ores 
what purposes graues are intended to serve, except for the 
fairly frequent complaint that purposes are too often 
ignored (Dyer, 1967; Fricke, 1965; Milton, 1966; Korn, 
1969; Scriven. 1969; Westland, 1969; Wolfle. 1968). Even 
Thorndike’s presumably comprehensive review in the 
Encyclopedia of Educational Research (Thorndike, 1969) 
dealt primarily with the difficulties to be overcome if 
grading is to improve. Thorndike, and the body of research 
he reviewed, treated grades like the weather. They me an 
inevitable part of educa'.ional life, and the best we can do is 
accommodate them. Whether accommodation is preferable 
to their abandonment, or whether substantially different 
procedures might belter accomplish the purposes for which 
grades are intended, are apparently seldom considered. 

Grading, according to Scriven (1969), is a fundamental 
educational practice particularly in need of empirical 
investigation with respect to the purposes and values it 
serves. Thai such investigations have not been made is 
attributed to the telui-arwo of researchers to examine 
questions of social values or ;nortl Such questions 

are considered beyond the reii-i jfempi. k I investigation. 
Yet the distinction between feds and values o,i which 
researchers base their avoid? nte of value-oriented research 
is spurious. Decisions involving questions of merit, worth, 
or value should hav; empiric*! justification. 

Stake (1970), in discussing the evaluation of educational 
programs, urged thr.t nore attentioi be given to empirical 
studies of the gcaiS and valutt that determine criteria of 
performance. His argument holds equally wefl foT the 
evaluation of students. The purposes of evaluation and 
selection of the kinds of performance to he evaluated are 
issues amenable to empirical study. 

The discussion of grading purposes that follows rests 
only indirectly on empirical data. Studies to guide the 
selection of purposes, to direct educational decisions that 
touch on social values oi moral questions, have not been 
attempted. The competitive aspect of grading, for example, 
has been :ited as both desirable and uiKJesinble, yet very 
little evidence is available to support e : thcr view. Never- 
theless, many discussions of grading practices start with an 
unexarnined assertion that grades have a stated purpose. 
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Some reports o( current practices clearly imply one or more 
purposes for grades. From these statements and implica- 
tions, the generally accepted purposes of grades can be 
determined even if the justification for their acceptance 
cannot. 

In one of the few general considerations of grading 
purposes, Ericksen (1967) made two important distinc- 
tions, Grades can serve either administrative or educational 
functions and both functions, in turn, can serve either 
students or the institution and society ai large. Grades 
provide a reasonably standard way of recording student 
progress and performance for administrative decisions 
about retention or dismissal, selection, transfer, honors, or 
extracurricular participation. Educationally, grades are in- 
tended to help students and professors alike to adjust their 
academic programs and activities to make the. o$t 
effective. Although evaluation rather than grading usually 
accomplishes this function, this aim is often advanced as 
one of the important purpovs of grades 

The administrative functions of grades usually serve 
institutions, while their educational functions v;ve 
students as well as institutions. Whether these diffeicnt 
functions conflict, and if they do. how precedent C ! ‘ be 
determined among them, are questions that should be 
probed through empirical studies, 

Sorting and selecting s ludr rits 

By an overwhelming margin, the most commonly dis- 
cussed purpose of grades is their use as a device for 
screening and selecting students for moie advanced educa- 
tion, employment, fellowships and awards, honors, transfer 
to other institutions, and participation in institutional 

iivities. This is an administrative i at her 0 an an educa- 
tional function, and serves the institution or society rather 
than the student. Its disproportionate attention in the 
literature Indicates a tacit assumption of priorities that 
justifies dose f examination. 

Glazer (1970a, 1970b) argued for the usefulness and 
importance of grades as a method of ordering students with 
respect to academic merit. Succesvtvi selection to higher 
educational programs on the basis of merit progressively 
differentiates the population with respect to academic 
accomplishment. This ensures ihat as selection becomes 
increasingly rigorous, the most capable people face the 
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uiosi demanding tasks, at least in terms of academic 
performance. The resulting concentration of people with 
high academic capability further enhances their produc- 
tivity. Dispensing some of the rewards of society in 
accordance with academic merit is highly defensible in view 
of the importance ’o society of the academically capable. 
Grades, as the mechanism by which people are sorted 
according to academic merit, are therefore quite important 
and are a more equitable mechanism for distributing 
society’s rewards than is parental social status, which they 
to some extent replaced. 

Jencks and Riesman (1968) gave a contrary inter- 
pretation. The academic achievement that grades reflect is a 
somewhat circumscribed kind of performance more readily 
attained by members of higher social and economic classes 
than by those of other classes. Yet education is also the 
primary path to higher social and economic status. Con- 
sequently, educational selection based ori previous per- 
formance offers the opportunity for further development 
to those already most highly developed and increases the 
gjp between the lower and upper segments of the popula- 
tion with respect to whatever benefits education provides. 
As those benefits become more strongly associated with 
power and prestige, formal education can be charged with 
exacerbating already serious social ills. Glaier’s argument 
stressed that grades are essential primarily because they do 
differentiate according to academic performance and 
thereby make the distribution of social rewards more 
equitable than they otherwise would be. The critical point 
is whether academic performance is sufficiently important 
to be the basis for the distribution of large social rewards 

Sociologists and other critics of American education 
(Caplow, 1954; Frtedenberg, 1970; Katz, 1968; Lauter and 
Howe, 1970; Sexfon, 1967) have argued that one of its 
primary achievements has been to maintain the exiting 
socioeconomic class structure, smoothing the way to 
socioeconomic advancement for those already possessing 
the desired social characteristics while systematically 
hindering and discouraging others. From the primary grades 
up, it is argued, those culturally unattuned to the dominant 
social class have been discouraged, seamed, and labeled 
incompetent. An important means hr producing these 
effects has been the teacher assigned gude, which finds its 
justification in ii: consistency from teacher to teacher arid 
fro : year to year. But consistency by itself has little to 
recommend it If the substance behind consistency is 
docility, compliance, agreeable ness, and teacher -approved 
deportment Instead of intellectual competence. . 

Whether this criticism Is justified or not Is difficult to 
determine. The view that the educational system maintains 
existing social inequities h based on subtle, long-term social 
effects that are altered by a variety of other social forces. 
Yet it is an enormously Important issue tliri has been 
almost totally igno'ed in research if not in social comment. 

A component of the edi/cational process as pervasive as 
the system of grading a nd the resulting grade-based selec- 
tion cannot fail to hz.ve Impoitant consequences for 
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society. Thai these confluences go beyond the training of 
a technically competent work force, pen jps in undesirable 
directions, seems probable. They deserve a kind of atten- 
tion not provided by the current arguments over two-level 
versus five-levt! grading schemes. 

Selection for advanced education 

Graduate and professional schools are the primary 
beneficiaries of the use of grades for selection; they are also 
the group of “consumers” of grades most concerned about 
departures from traditional patterns (Dale, 1969; Hanlon, 
1964; Hassler, 1969; ladarola, 1969; Law School Admission 
Test Council, 1970; Rosser, 1970; Rossmann, 1970, Sparks, 
1969). The deans of 230 graduate schools pieferred 
overwhelmingly that their applicants present transcripts for 
evaluation that contain a predominance of letter grades. 
Yet a five-to-three majority indicated a reluctant ac- 
ceptance of transcripts containing nothing but Passes if 
additional information about the applicant were available 
(Hassler, J969). The Law- School Admission Test Council, 
with representatives from almost every law school in the 
country, recently published a formal statement warning 
about the consequences to law school admissions of even 
partial Pass-Fail grading (Lav, School Admission Tes» 
Council (1970). 

Typically, selection to graduate and professional schools 
is made from a pool of applicants much larger than the 
number to be admitted. Since most of the applicants are 
reasonably well qualified, distinguishing between the poorer 
of those admitted and the better of tho'.e rejected requires 
fine discrimination. Gradepoint aver^es from under- 
giaduate institutions, in spite of their deficiencies, permit 
such hairline distinctions, and the abundance of qualified 
applicants serves to keep selection errors low. Virtually all 
those selected are capable of acceptable performance. 
Errors of rejection may be more numerous but, by their 
nature, are seldom obseived and present no problems for 
the institution. 

The question of error ii nission decisions highlights 
one of the problems in assessing the usefulness of grades in 
selection to higher educational institutions. For error to be 
measured, some definition of "correct” decisions is re- 
quired. An admitted student who earns good grades and 
completes the course of study is considered to represent a 
“correct” admission decision. But deans and faculty mem- 
bers often deny that high probability of earning good 
grades is, by itself, an adequate basis for admission, and the 
eorrcetn-ss of decisions to reject applicants is neither 
defined nor measured. The usefulness of grades as a 
selection criterion cannot be adequately assessed until the 
purposes of selection are better defined. 

An unexammed question in educational selection is 
whether an institution’s educational resources should be 
denied to those not likely to receive high grades. Semen 
(1969) indirecily raised this question by stating that one of 



the three essential functions of grades is to provide “a basis 
for the allocation of scarce resources to those who can use 
them best (p. 114).” He did not attempt a definition of 
what constitutes the “best” use, and neither have others. 
The assumption tb ii students who receive high grades have 
made better of an institution’s resources than have 
students who acuiv; iow grades may be justified. It has 
been questioned, however, and merits attention (Jencksand 
Riesman, 1968; Woodfng, 1968). Furthermore, the best 
use of an institution’s resources varies with the purposes of 
the institution. Medical schools, law schools, graduate 
schools of business, graduate schools of social work, and 
other institutions that award advanced professional degrees 
obviously differ in Iheir purposes. 

Even among institutions of the same type, purposes may 
differ. Some law schools, for example, consider their major 
function to be preparing students to pass the bar examina- 
tion and rnter legal piactice. Others consider that purpose 
secondary to providing a legal education to all who might 
benefit from it, whether they become practicing attorneys 
or not. Still others place great importance on graduating 
those likely to produce advances in the present system of 
jurisprudence. Differences in educational purpose might 
imply different selection procedures-yet all depend heavily 
on undergraduate giadepoint averages. 

The departments within a graduate school might also 
vary in their selection criteria. Garmon (1967), for ex- 
ample. showed that faculty members in the physical and 
social sciences differed consistently in. their expectations 
for their students and in their own role perception in 
relation to students. (See also Ricsman, Gusfield, and 
Gamson, 1970.) Yet studies have not been carried out that 
would allow selection procedures to be geared to different 
institutional or depaiimental purposes. 

Anxiety over a possible thre2t to the selection function 
led to a survey of cojlcgcs having Phi Beta Kappa chapters 
for an assessment of the difficulties the growing use of 
Pass-Fail grading might piesint in electing students to 
membersliip, The Committee concluded that the use of 
Pajsvail grading was not yet mjch of a threat to adequate 
evaluation of students because even when used it seldom 
constituted more than a fraetjon of a student’s grades. The 
Committee further slated that grades should not be the 
only consideration in election to Phi Beta Kappa. In fact, 
two of the Committees four recommendations urged 
dc-emphari* of grades in election to Phi Beta Kappa 
(Pass-Fail Study Comjniltee, 1969). 

If the probability of earning good grades is accepted as 
the most justifiable bias for selection of students, problems 
still remain when gradepoint averages are used as a selection 
criterion. The effectiveness of previous grades as predicts. s 
of later grades ha« been examined extensively but with little 
depth, as is indicated by the large proportion of grade 
prediction studies that are doctoral dissertations. Ex- 
perienced researchers with the resources io probe an issue 
deeply seem to find other problems more interesting. 
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For most students previous grades do predict later grades 
moderately well over relatively short time periods. Under- 
graduate grades predict first-year grades in graduate and 
professional schools moderately well, but they predict more 
advanced grates poorly, particularly in clinically oriented 
programs (Bartlett, 1967; Gough, 1967; Gohm 1968; 
Hanlon, 1964). The number of studies that show negligible 
relationships between undergraduate an i graduate school 
grades suggest that prediction is a rather selective process, 
operating differently for different people n different 
situations If an inslituiion should decide that the predicted 
gradepoint average is to be the dominant factor in deciding 
admission, the difficulty of predicting that average is still 
substantial. 

The problem in depending heavily on undergraduate 
gjadepoint average for selection to graduate and profes- 
sion?! programs can be illustrated by showing the implica- 
tions of a conation coefficient of .30 between under- 
graduate gradepoint averse and first-year gmduatc school 
grades. If tne distributk ns of both grade averages -ire 
symmetrical and approximate a beil-shaped curve, a corrida- 
nm coefficient of .30 will occur when 20 percent of the 
total group of students drop from the top half of the 
distribu'ion of undergraduate grades to the bottom half of 
the graduate school grade distribution. Another 20 percent 
will move from the lower half with respect to under- 
graduate grades to the upper half with respect to graduate 
school grades. About 60 percent will achieve graduate 
school grades that put them in the same half of the giade 
distribution as did their undergraduate grades. 

A sharper picture of a correlation coefficient of .30 can 
be seen by compe’ng the numbers of students who will 
shift their positioi with respect to grade quintiles. Only 
one-third of the students in the top 20 percent with respect 
to undergraduate grades will remain in the top 20 percent 
with respect to graduate school grades, Correspondingly, 
only one-third of those in the lottom fifth at entrance will 
remain there. Ten percent of the top one-fifth will drop all 
the way to the bottom fifth in graduate school, and ten 
percent of the bottom fifth with respect to undergraduate 
gjades will move to the top fifth in graduate school. Greater 
numbers will move from the second to the fourth quintiles 
and from the fourth to the second. Clearly, a correlation 
coefficient of .30 would indicate a substantial amount of 
change in performance between college and graduate 
school. 

The figure .30 has been choser for illustration because it 
is 'dose to the median value that has appeared in a large 
number of studies predicting first-year graduate school 
grades from undergraduate gradepoint average. Among 
about 40 studies reported since 1965, involving various 
kinds of graduate and professional schools and with several 
studies including from 10 lo 25 different institutions, the 
correlations between undergraduate and first-year graduate 
school grades fluctuated rather widely on either side of .30. 
A report of the correlations obtained in graduate schools of 
business is illustrative. For the first year classes in 1967-6® 



at 26 graduate schools of business, the median correlation 
between undergraduate and graduate school grades was .28 
(Pitcher, Deemer. and Imith, 1968). For 19 of the same 
schools the mean correlation coefficients in 1954 and 1958 
were .28 and .34, respectively (Pitcher and Winterbottom, 
1965). 

Klein and Evans (1968) reported correlations between 
undergraduate grades and first-year law- school grades 
3mong seven law schools that ranged from .11 to .43 with a 
median of .33, Pitcher (1965) found similar relationships in 
10 law schools for students entering in 1962, the correla- 
tions ranging from .10 to .39 with a median of .27. In a 
more recent report of students entering five law schools in 
1966 (Schrader and Pitcher, 1970), four of the correlations 
between undergraduate grades and first-year law school 
grades were between .27 and .37. The fifth was .20. 

Medical schools, dental schools, schools of social work, 
schools of education, and a sihoo! of veterinary medicine 
have shown similar results. Most of the individual correla- 
tion coefficients reported fa T l in the lange from .10 to .50, 
clustering around ,30 (e.g., Boldt, 1970; Bundy, 1968; 
Cough, 1967; Hepworth, 1969; Lunneborg and Lunneborg, 
1966; Roemer, 1965). 

In various graduate school departments the correlations 
between undergraduate and graduate school grades range 
somewhat mcic widely, from about -.20 to ,60 (Hackman, 
Wiggins, and Bass, 1970; Lannholm, 1968a; Lannholm, 
Marco, and Schrader, 1968; Mehiabian, 1969; Stordahl, 
1967; Wiggins Blackburn, and Hackman, 1969). ,’n view of 
the great variability of the correlation coefficients and the 
fact that the extreme values tend to occur with samples of 
fewer than 100 students, little can be said with confidence 
about the relationship to be expected between under- 
graduate grades and graduate school performance. In 
selected circumstances the relationship may be quite strong, 
but what mighl produce those circumstances has not been 
identified. 

V»A studies reviewed above, almost without exception, 
involved predictions of fust-year graduate and professional 
school grades. Since predictions of second, third, and 
fourth-year grades can be expected to be successively lower, 
iht utility of undergraduate grades as a device for making 
any but ihe grossest decisions about admission to graduate 
schools seems m;estionabJe. Since admission must continue 
to be selective u s long as applicants far outnumber those 
who can be admitted, the alternative is to find more 
specific studeni attributes or combination of attribute? 
that are p< rtinent to the performance the selecting institu- 
tion expects fron its students. These attributes are not easy 
to specify; but until they are, selective admission will not 
be a very' well developed process. 

The weak relationships between undergraduate and 
graduate school grades can he txcused on several grounds. 
Graduate studen's arc a selected group; therefore, the 
distribution of tidergrsduile giades has been truncated. 
Gradual school gr?des also have a limited range, often only 
consisting of A’s and BY Yet pointing out reasons why a 
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predictive relationship is not high does nothing to improve 
the usefulness of the predictions. If graduate school gades 
cannot be predicted without substantial error, other criteria 
for selection should be sought, perhaps based on the 
particular puiposes of the selecting institution or on other 
student characteristics desired by the faculty, such as those 
reported by Davis (1965) and Hilton, Kendall, and Sprecher 
(1970). 

The heavy reliance on the gradepoint average in admis- 
sion to advanced educational programs despite its question- 
able validity seems due to two factors. One is its adminis- 
trative convenience. Since it is quantified, it has the 
appearance of accuracy and permits decisions based upon it 
to be objective. Decisions can then be made mechanically— 
which is often what is meant by objectivity. A comment 
from a respondent to a survey conducted by the Council of 
Graduate Schools (Hassler, 1969) illustrates this point: 
“Our Graduate School requires a 2.500 average on a 4.000 
scale.” What lies behind those three-decimal numbers 
remains unknown and unquestioned. 

Ihe second favorable aspect of the gradepoint average is 
its academic respectability. It reflects the combined judg- 
ments of a number of faculty members -people expected to 
make judgments from points of view- similar to those of the 
faculty members in the selecting institution. It operates, 
therefore, like a set of recommendations to an exclusive 
dub written by long time members who know the kind of 
people (he other club-members prefer. This is a harsh 
judgment and probably overstated. Generally, grades are 
the result of conscientious efforts at evaluation and of 
thoughtful, at times agonizing, decisions about grade 
assignments. They reflect the best judgments available 
about capabilities faculty members consider important. But 
the exclusive-club analogy again becomes appropriate, 
because no one can tay just what kind of capabilities a 
faculty member had in mind when he evaluated his stude Us 
and assigned g ades So grades and the gradepoint average 
are left with little more than their academic respectability 
vouched for by a member in good standing of the proper 
kind of club. 

Parenthetically, the icidiness of business firms -a dif- 
ferent sort of club-lo accept the recommendations of 
academic institutions is -t range, particularly when recom- 
mendations of employcii that say a jx.son perfoimed some 
business function very rbly will have no influence at all in 
getting that penon admitted to in academic institution. 

The claims cf validity for the gradepoint average and for 
its acceptability as the primary admission criterion rest or. 
more than respectability, however. As described abo v e, it 
does predict later grades moderately well a f?ir proportion 
of the Ch.se if *he later giades are not too much later. But 
even this has a questionable circularity thou* it, showing 
only that similar kir.ds of judges will arrive at somewhat 
similar kinds of judgments about academic performance. 
The validity of both sets of judgments ought to rest on a 
different kind of evaluation. 
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Academic achievement tests appear to piovide that 
different kind of judgment on wlrich the validity of grades 
can be based. Scores on the Graduate Record Examina- 
tions, for example, sometimes predict fiaduate school 
grades moderately well and can themselves be predicted 
from the undergraduate gradepoint average (Lannkolm, 
1968b). But the circularity is not really broken because 
be th aptitude and achievement examinations are con- 
structed explicitly to predict grades and derive their validity 
entirely from them. 

Westland (1969) concurs in stating that in order for 
college degrees, and by inference grades, to have social 
significance, they must have their meaning validated by 
social, not academic criteria. 

I contend that at the moment we just don’t know, in the 
scientific sense, vital we are assessing. The proJem is the 
criterion problem We risk chaos if we don't look beyond our 
own ravels for just if cation of what we are doing (West' and, 
1969, p 360). 

Woodring made a similar point, though not quite as 
cogently. 

Bui no one can seriously believe that grades ue the goal of 
highei education. And the assumption tint those who make 
high grades are the ones who profit most from their education 
ard are most likely to make the greatest contribution to 
society rfte. graduation should be re-examined, for it must 
v/ith stand a considerable amount of contradictory evidence 
(Woodring, 1968, p. 4 2). 

Nevertheless, grades and gradepoint averages cannot be 
dismisyid. The pooled judgments of intelligent people are a 
far sounder base for decision than is available otherwise. 
While Woodring (1963) contends that “grades have little 
meaning except as evidence of readmess for more formal 
education ” Westland's view seems sounder. Grades prob- 
ably do represent something useful; we just don’t know 
what it is. 



Selection for employment 

The extent of the use of grades as selection criteria by 
employee is uncertain. Some put heavy weight on grades; 
others use them only for very coarse screening; still others 
use them not at all (Calhoon and Reddy, 1968; Committee 
on Grading, 1970; Dickenson, 1955; Rappel, 1962; Mid- 
west College Placement Association, 1964-65; Paquette, 
1966; Waivers and Bray, 1963). Those employers who 
depend heavily on grades tend to have strong convictions 
but little evidence of their value. Law firm representatives 
interviewed by the Committee on Gracing at the University 
of California School of Law, for example, were quite 
vehemently in favor of a detailed grading system, stating 
that students much below the top of the class just would 
not be adequate for work in their firms (Comnvttee on 
Grading. 1970). Yet their failure to hire any but top 
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students makes one wonder on what evidence that policy 
was based. 

Hoyt (1965, 1966, 1968) reviewed the studies fie could 
find up to 1965 that related college grades to occupational 
sW’cess. The studies were scarce and their results equivocal. 
Hoyt concluded that “college grades have no more than a 
very rnodeJ correlation with adult success no matter how 
defined (Hoyt, 1965, p, 45).” Studies reported since Hoyt’s 
review' provide evidence on both sides of the issue but his 
overall conclusion remains valid (Calhoon and Reddy, 
1968; Heckman, Ranas, Lazenby, and Moore, 1969; 
Kinloch, 1969; Mason, 1965; McCiaine, 1968; Pigge, 1968; 
Porter, 1969; Salyer, 1968-69). 

The low relationsliip betw een college grades and occupa- 
tional performance docs not mean grades are useless 
determinants for employment selection (Hoyt. 1966; 
Raimi, 1967). College grades should indicate the level of 
student performance in academic tasks associated with 
understanding a given body of knowledge. This kind of 
performance may ot may not be similar to the performance 
required on-the-job. If the rap some businessmen see 
between the academic world ai the practical world really 
exists, employers should not expect grades to be related to 
job success. In Raimi’s view, job success depends much 
more on experiences and capabilities enveloped after being 
hired than on the few years of college courses ihat precede 
employment. Therefore, job experiences in a few years 
heavily outweigh any college effects (Raimi, 1967). 

If it is true that the effects of college are soon 
overshadowed by employment experience, one might ask 
why some employers stress grades so heavily. One reason is 
good grades may indicate a facility for learning that 
will be Ip a person acquire the knowledge and skills necessary 
fot vood job performance. Another is that some of the 
knowbdge and understanding acquired in college may be 
necessary as a starting point for developing the additional 
knowledge and skills required on the job. The relationship 
between college and job performance would then become 
attenuated wiih time, a phenomenon that has been ob- 
served (Kinloch, 1969). These reasons for the declining 
relationship, however, are suppositions neither supported 
nor refuted by evidence. 

A possible reason *hat some studies show moderate and 
others negligible correlations between grades and job 
performance is the greater importance in some job settings 
of compliance or willingness to follow instructions un- 
critically. Some evidence exists that this personal quality is 
associated with grades (e.g., Domino, 1968; Holland, I960; 
Femberton 1969) and its importance in some kinds of jobs 
mev be presumed The nonacademic qualities ofagreeable- 
nevs, personablene^s, compliance, and sensitivity to the 
instructor's preferences that please ftculty members can 
itso be expected to please job supervisors. 

Other differences In job requirements may also account 
for the varied results in predicting job performance from 
college grades. Heckman, Banas, Lazenby, and Moore 
(1969) found correlations between grades and the salary 
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progress of managers in a large manufacturing company to 
be highest for engineers and lowest for those in purchasing 
and traffic departments. If, as seems likely, both grades and 
job performance are multidimensional, correlations be- 
tween them will fluctuate widely, depending on how the 
determinants of each complex variable happen to be 
combined into a single measure, and what relationship 
exists between their primary continents. If academic 
grades are used in employment selection, more needs to be 
known of the structure of both grades and job per- 
formance. Determining relationships among selected com- 
ponents of the two kinds of performance may be useful. 

Motivating students 

A second widely asserted purpose oi grades is to act as 
u motivators”-that is, to induce students to apply them- 
selves to learning things they would not learn ii not graded. 
Students and faculty alike believe that grading does have 
tbit effect (Katz and Associates, 1968; Spaiks, 1969; 
Stallings and Leslie. 1970), and studies of Pass-Fail grading 
have indicated that the nature of the grade does influence 
how students will allocate their study time (Ericksen, 1967; 
Feldiinesser, 1969; Freeman, 1969; Karlins, 1969; Milton, 
1967; Morishima and Micek, 1970). But the available 
evidence is too superficial for conclusions about motivating 
effects of grades to be held with any confidence. 

The studies cited above showing the effects of Pass-Fail 
grading on allocation of study t'me demonstrate that 
students put less effort into Pass-f ail courses than into 
other courses. Each of these studies, ihovgh, w’as concerned 
with optional Pass-Fail grading. The students were per- 
mitted to teke one Pass-Fail course per term; all other 
courses were graded A through F. Almost invariably in 
these circumstances students slighted the Pass-Fail course. 
But this can hardly be considered a damaging criticism of 
Pass- Fail grading. 

Pass- Fail options typically exclude courses in the 
student’s major field. That students should emphasize 
courses in their major field at the expense of other courses, 
often taken only to satisfy an institutional requirement for 
breadth, should not be cause for concern. The opportunity 
given s udents to alio cate their study time selectively $e<* ms 
as much an argument in favor of Pass-Fail grading as against 
it. 

Evidence from studies of limited Pass-Fail options is 
inadequate to evaluate the effects of Pass-Fail grading 
applied throughout an institution, Where complete Pass-Fail 
grading or purely des:riptive grading has been Instituted, no 
evidence has been found that students put less effort into 
their studies than the) would under any other grading 
system. Lawrence College has operated without 

grades for man) years (Murphy and Raust?enbush, I960), as 
have a number of other liberal arts colleges. One depart- 
ment of the University of California Medical School was 
successful with a system of faculty comments instead of 
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grades for a number of yeais until the faculty, over strong 
student objections, returned to a more conventional system 
of grading (Marshall, 1968). At the University of Kansas 
Medical School a shift to Pass-Fail grading seemed to reduce 
competition between students to a slight extent but had no 
discernible effect on student effort The contest of students 
pitted against faculty, in which the students work to get 
past the obstacles l 1 -? faculty throw in fiont of them, 
continued. The contest between student and student to see 
who could outperform the other had never been great 
(Becker, Geer, Hughes, and Strauss, 1961). Horowitz 
(1964), also at a medical school, found no decline in 
student effort after all grades were abolished, but dia Find 
that the appearance of industi iousness or of lack of interest 
can both be misleading. Finally, at the University of 
California al Santa Cruz, where complete Pass-Fail grading 
has been the practice since the opening of the institution in 
1965, faculty members saw no evidence that students 
worked less diiigemly thin had students at other institu- 
tions having more conventional grading systems 
(Committee on Educational Policy, 1970). 

The experimental program that comes closest to pro- 
viding a useful comparison between a graded and an 
ungraded Instructional system is that followed at six liberal 
arts colleges, in which selected students pursued a 4-year 
program of independent study without specified course 
requirements and without grades (Cole, 1966; Operation 
Opportunity, 1970; A Repoit on the Independent Study 
Program, 1 970). Within the same institution, some students 
worked order the usual grading system while others were 
freed completely from grading requirements. This does not 
mean the students in the experimental programs were not 
evaluated; they were. But the results of those evaluations 
were communicated directly to the student without re- 
cording a grade. The consequence of these programs cannot 
be attributed to the absence of grades for two reasons. The 
students were carefully selected and many dements of the 
experimental program other than *he absence of grades 
could have been responsible for its effects. Nevertheless, 
some Infejenccs about the effects of grades can be drawn 
with no more recklessness th?n is involved in most of the 
current opinions about grades. 

Some evidence of the effect of grades as motivators may 
be observed, in that students in the experimental program 
often chose not to do some of the things that would have 
been required in regular courses. They tended, for example, 
to do less writing than was required of other students. But 
they did study and they did learn, although probably in 
ways not as nbviousJy well-ordered as some faculty mem- 
bers w ould have liked. At the end of the first 4 years of the 
experiment the graduates included a number of Phi Beta 
Kappas and Woodrow Wilson fellows. 

A tentative conclusion from reports of the programs is 
that grsks played only a small part, if any, in inducing 
students to learn. On th; other hand, the examination 
procedures, whether a gnde was to be assigned or not, did 
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guide the students’ academic behavior. Impending examina- 
tions often induced intense anxiety, even though no grade 
was to be given. 

The primary source of student discomfort in the 
program, which was often great, seemed to be neither the 
absence of formJ grades nor even the lack of structure. 
Instead, it was the ambiguity of many aspects of the 
program, due partly to its new ness. The students often were 
not sure what was expected of them, were rot ready to 
believe that they could, with their perceptor’s guidance, set 
their own expectations, and were uneasy over their own 
evaluation of the;r progress. 

li is not surprising that abandonment of customary 
guidelines and indicators should lead to anxiety and 
discomfort. An unusual kind of student is needed to 
manage it. When an entire college, such as Sarah Lawrence 
or the University of California at Santa Cruz, changes the 
guidelines, the effects are much less severe. But even where 
the students were a very small group in a new and sharply 
divergent program, the absence of formal grades did not 
lead students to squeeze through with as little effort as 
possible. The students either performed well or voluntarily 
withdrew to return to a more familiar academic envLon- 
ment and to whatever constraints course grades impose. 
The program v as clearly not an invitation to indolence. 

One conclusion that seems justified is that different 
lands of students respond differently to different peda- 
gogical procedures. While some students need the formal 
affirmation of accomplishment that a final grade gives Mem 
and will direct their efforts toward that goal, others find 
the constraints of grades onerous. This should hardly be 
surprising and has been reported before (Flecker, Geer, 
Hughes, and Strauss, 1961; Birney, 1964; Horowitz, 1964; 
Miller, 1967). 

In several studies, students have been observed closely 
enough and over a long enough period of time for informed 
judgments aboLt motivational processes to be made 
(Becker, Geer, and Hughes, 1968; Becker, Geer, Hughes, 
and Strauss, 1961; Horowitz, 1964; Murphy and 
Raushtnbush, i960). As lr. the experimental programs 
described above, students were intensely concerned tbout 
their academic performance as a basis for their own 
self-evaluation and the satisfaction that results from a sense 
of competence. But the information they used for self- 
evaluation came from a wide variety of sources, not just 
from grades. 

Students’ needs for formal certification of achievement 
are an externally Imposed incentive to study. The desire to 
perform well simply for the resulting sense of satisfaction Is 
more Internally ba^ Cf udies of the Pass-Fail option 
suggesr that the external reward may override the Internal 
one. Yet the desire fer competence, as assessed by the 
student himself and as revealed in a variety of v'ays by 
teachers and by other students, provides a strong motiva- 
tional force in many students. 

The distinction between extrinsic and intrinsic sources 
of reward has been given as one reason for the inadequacy 
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of grades as motivators (Committee on Educational Policy, 
1970; Karlins, 1969 Miller, 1967). The extrinsic -intrinsic 
distinction, however, is no: always clear. The student’s own 
self-assessment and intrinsic satisfaction, as Becker, Geer, 
and Hughes (1968) <nd Hcu>witz (1964) have shown, 
depends largely on external sources. When relf-evaluative 
procedures and opportunities are li. riled, as sometimes 
happens concurrently with a dt-emphjsis of grading, many 
students become uncomfortable. But the anxiety is likely 
to arise not from the absence of grades but from lack of an 
opportunity for self-assessment. Grades at the cud of a 
course only act as formal confirmation of the self- 
assessments students have been making regularly. Disputts 
between students and faculty members over grades occur 
when the grade does r.ot conf rm the student's previously 
formed self-assessment. 

So far the motivating effect of grades as rewards fo; 
which students work has been considered. But grades are 
also used punitive ly by faculty members to coerce students 
into class attendance, performance of assigned work, and 
general deportment of the sort that pleases the eacher 
(Buchman, 1970; Dressel and Nelson, 1961; Govdman, 
1964; Mayhew, 1969; Schwab, 1954; Wallace, 1966). The 
reluctance of some faculty members to change the grading 
system seems due to a fear that without the coercive effect 
of grades the teacher would lose most of his influence over 
student performance (Mayhew, 1969). The possibility that 
students would not attend a professor’s lectures or follow 
his directions for study if they were freed from the 
demands M grades can be a frightening prospect. Holding to 
grades to avoid faring that prospect is more comfortable. 

In summary, the motivating effect of grades is complex 
ana not well understood. Some students value the formal 
affirmation of accor iplishnwnt that grades represent and 
w’Ork to get it. For others the almost continual self- 
assessment derived from cues provided by teachers, other 
students and regular course activities is sufficient. Published 
grades at the end of a course have lithe additional 
motivational effect for these students. 

Another point basic to the use of grades as motivators 
should be mentioned, although it will not be developed at 
length. It is the question of whether faculty members 
should be concerned at all with devices to induce students 
to study. As colleges increasingly abandon the role of 
surrogate parent with respect to the social tehavior of 
students, coercing $ dents into desired pattern* of activity 
by ficulty-adrniniste^d rewards and punishments mUht 
also be abandoned as unnecessarily patemtliMK. 

The* informative function of grades 

The first tv/o purposes of grading-** selection devices 
and as motivators- can both be considered services pri- 
marily to society father than to students. The use of grides 

selection devices permits higher educalion to perform its 
function a* a social sieve, determining who shall be 
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admitted to positions of prestige, power, and financial 
reward (Caplow, 1954; Clark, 1962; Jencks and Riesman, 
1968; Mayhew, 1969; Sexton, 1967; Sparks, 1969, Tyler, 
1969). Grades as motivators also serve society's purposes, 
inducing students to kinds and levels of performance they 
presumably would not choose freely. Although some 
students benefit from the use of grades in selection, and 
although grade-induced studying may also be considered 
beneficial to the student, the primary service is to society. 
In contrast, the third most commonly discussed purpose of 
grades, which receives far less attention than the first two, 
is their use as a device 1o serve students by informing them 
about their performance. 

The contention that feedback to students about their 
performance constitutes an important purpose of grades 
(Committee on Grading, 1970; Dale, 1969; Sparks, 1969) 
confuses evaluation (the assessment of performance) with 
grading (reporting of die assessment results). A variety of 
procedures are available to inform students about the 
nature of their performance without the publication of a 
summarizing symbol to represent overall performance in a 
course. 

Becker, Geer, and Hughes (1968) and Horowitz (1964) 
found that students used a variety of cues to assess their 
level of performance relative to other students. Stallings 
and Leslie (1970) reported a survey of students at the 
University of Illinois it which most students did not 
consider grades to perform a useful feedback function. 
Students axe seriously concerned with so If-e valuation and 
tend to become anxious in the absence of evaluative 
information about their performance (Funkenstein, 1968; 
Horowitz, 1964). But course grades, since they do not 
appear until course completion and are limited in content 
to the information that can be carried by a single symbol, 
axe not effective feedback devices. The evaluative pro- 
cedures that lead to the most effective feedback are often 
not those that lead to the most useful ranking of students 
(Bloom, 1968; Husek, 1969). And relative rank on a global 
evaluation is not very informative at best. 

Effective feedback helps students judge their progress on 
their own terms. Acquiring a general grasp of the major 
issues may be all a student wants from a course uu'slde hfc 
major field but Is far from adequate In a couvse Important 
to his major. Effective feedback also leads to modifications 
in student behavior that will improve performance or to 
assurance that pcrfoimance Js adequate. It should indicate 
areas of weakness or topics insufficiently understood. 
Successful and unsuccessful methods of study should be 
identified soon enouah to permit adjustments to be made. 

Feedback should be related to the processes as well as 
the products of learning, differentiating among various 
forms and areas of icaderrdc accomplishment and indicating 
directions for future study. It is most effective when 
considered in telation to the student's pre-ious accomplish- 
ment and capabilities. Performance in relation to other 
students has limited usefulness for feedback and is at times 
misleading, as when the other students in a class do not 



constitute a useful reference group for some particular 
student. In Scriven’s terms, feedback is a product of 
formative evaluation, grades of summative evaluation 
(Scriven, 1967). 

If the objective of evaluation is to rank students for 
some purpose that requires a relative assessment of overall 
accomplishment, observation of the procedures a person 
goes through in arriving at a result is not important; 
whether he arrives at the desired or correct result is 
important. Final course grades, constituting a coarse rank- 
ing of students, indicate roughly what a student has 
accomplished academically in that particular course com- 
pared with other students in the course. They convey useful 
information primarily to people who were not engaged in 
the course. Students and teachers learn little from them. 

For these reasons, conveying information to students 
should not be considered an important function of grading. 
The information grades convey to students tells them what 
information admissions officers and employers will have 
about them on which selection decisions may be based. 
Grades also convey the instructor's overall judgment of the 
student's total performance, which may help him decide 
about future work in the field of the course. But this is a 
low order of information in comparison with what the 
studert has learned of his capabilities directly throughout 
the period of the course. The educational function of 
grades is therefore limited, both because they are assigned 
after the learning is completed and because they are little 
more than general summaries of information students have 
probably already received by other means. 

Institutional purposes of gr ades 

One of the major administrative purpose* ofgradesisin 
selection to graduate and professional education, a purpose 
that does not directly serve the interests of the institution 
awarding the grades. The grade -a warding institution doe.i 
use grades, however, for a variety of internal administrative 
purposes. The most important is probably in decisions 
about whether to permit student to re-entoll in succeeding 
terms. Although thto use of grades is critical for only a small 
proportion of stud-mts, over a period of years it excludes 
large numbers of students from further education. Its total 
social effect is therefore substantial, constituting an ini* 
portant way that the segment of the population permitted 
access to higher occupational, economic, and social posi- 
tions U defined. 

At three stages in the educational process -ad mission to 
college, retention in college, and admission to advanced 
education -grades exercise a substantial influence on deci- 
sions about who shall be permitted to continue. The 
assump'ion that grades constitute a defensible basis for 
tiVie deciions las some rational justification. Teachers 
pwfa - indents whom teachers before them have preferred. 
But whether the elements of performance that determine 
teacher preferences coincide substantially with the elements 



of performance on which decisions about continued educa- 
tion should be based is a question that has not been 
examined. As Scriven (1969) and Stake (1970) have urged, 
research is needed to determine how decisions about 
allocation of limited educational resources can most justi- 
fiably be made. 

Other institutional purposes are in determining admis- 
sibility to advanced courses, eligibility for extracurricular 
activities, awarding financial aid, and awarding academic 
honors. In these areas grades may be a sound basis for 
decision. The awarding of academic honors, for example, is 
by definition based on grades. The award of scholarships 
and other financial aid on the basis of grades is more 
questionable. Typically, a student who needs financial 
assistance has the form of that assistance-whether outright 
grants or loans and part-time work-based on his grades. 
The justification for this practice is similar to but less 
defensible than the justification for basing selection on 
previous grades, The rationale— that better performing 
students, in teims of the behavior indicated by grades, are 
more deserving of financial help than other students- 
cennot have been adequately examined in view of the 
limited knowledge of what grades represent. 

Grades are said to piovide important Information to 
teachers, permitting them to judge their own effectiveness, 
and to department heads and other administrators, per- 
mitting them to make comparative evaluations of teachers 
and departments. Grades are completely unnecessary, how- 
ever, to teachers’ self-e valuation! Evaluation of student 
performance is essential; gred'ng is not. 

Similarly, teachers, departments, and divisions aie often 
compared with respect to their trade distributions. These 
comparisons, however, provide no more information than 
how teachers, departments, and divisions compare in the 
grade distributions they produce. How this information is 
to be interpreted is largely unknown. Whether consistently 
low grades in a department result from poor students, poor 
teaching, an inappropriate combination of teaching method 
and student characteristics, poor evaluation, or inordinately 
high standards cannot be determined from comparisons of 
grade distributions. 



This is not to say that comparing grade distributions is 
useless. It may suggest why student attrition is so great in 
one department. Unusually low grades assigned consistently 
by the same teacher may indicate a particularly critical, 
demanding instructor or they may suggest an underlying 
attitude of hostility toward students that interferes with 
instruction and learning. Additional information might then 
be gathered to determine the reasons for unusual grade 
distributions. But grade distributions in themselves say 
almost nothing about the teaching or learning that 
occurred. 

Grading as preparation for life 

Grades have occasionally be;n said to be desirable in 
preparing students to face the competition they will 
inevitably meet in the “real world” beyond school. This 
view* seems to be a relic from an earlier day in which college 
was a pleasant, undemanding w r ay for sons of the social and 
economic elite to spend a few years before moving fully 
into the adult world. Whether or not it was ever widely 
justified, it certainly is not today. To consider college 
experiences as not belonging to the ‘‘real world ” whatever 
that may be, seems absurd. 

Few nonschool situations, in employment or elsewhere, 
have anything resembling the grading procedures of college. 
Even in employment, evaluation thn jgh the use of written 
tests is not particularly common. Civil Service procedures 
may come close to some aspects of college grading, but the 
Civil Service is not typical of most employment situations 
and its similarity to college is limited. 

A vast amount of evaluation does go on in almost every 
kind of situation, but most of it is highly informal, ad hoc, 
and far removed from anything like college grading. Yet 
even if situations we - ? common outside college in which 
grading much like that in college occurred, this would not 
in itself give colleges the responsibility to prepare students 
for those situations. Even colleges that assert one of their 
purposes to be preparing students for life do not claim to 
prepare students for every kind of situation they may face. 
Preparation for the competition of examinations and 
grading does not have demonstrable value. 



V. UNINTENDED EFFECTS OF GRADES 



While the intended purposes of grades have seldom been 
discussed e.uvp? by implica'icn, the unintended side effects 
of grades have frequently been reviewed in detail (Becker, 
Gear, and Hughes, 1968; Marshall, 1968; Miller, 1967; 
Milton, 1966; Milton. 1968; Raimi, 1967; Trow, 1968). A 
large body of empirical data could be brought to bear on 
the intended but unexamined purposes. Very little of the 
extensive discussion about the unintended effects of grade: 
is based on systematic observations. 



In iJlustration, a large amount of information is available 
that bears on the use of grades in selection to more 
advanced educational programs. Yet the philosophical, 
social, educational, and economic justifications for the use 
<«f grades in selector whkh could be examined in the light 
of that information, have been almost ignored. In conlrast, 
bttle more than personal impressions, at times probably 
well-founded but at other tin>es not, can be drawn on in 
support of the widely discussed view that grades distort the 
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learning process* The known is ignored; the unknown is 
described in detail. 



Distortion of learning 

Distortion of the learning process can have a variety of 
meanings* One is the belief of c* large number of students 
that the kinds of activities that produce good grades are 
often not those that would produce optimal learning 
(Seeker, Geer, and Hughes, 1968; Education at Berkeley, 
1968; Katz and Associates, 1968; Miller, 1967). The need 
to maintain a gradepoiM average high enough to assure 
selection to graduate school, or permission to re-enroll for 
the next academic year, is presumed to demand student 
time and attention that could be spent more productively. 
Specification by the instructor, either explicitly or im- 
plicitly, of the details of what must be done to pass tests 
write acceptable papers, take part satisfactorily in class 
discussions, or otherwise perform in ways that will be 
rewarded with good grades constrain* student behavior to 
uniform tasks that may not be unifornJy effective for all 
students (Cole, 1966; Miller, 1967; Milton, 1967; Torbert 
and Hackman, 1969). 

The above argument does not contend that teachers 
should not direct the learning activities of their students, ft 
does contend that students are capable of greater discretion 
than is allowed by the present system of grades in the ways 
they will respond to direction from the teacher. When the 
fact of college graduation was in itself the critical deter- 
miner cf entry to desirable jobs and higher social status 
students were freer to control their own academic behavior. 
The “gentleman's C” was often an acceptable level of 
performance, and an occasional D was no more than a 
temporary blow to self-esteem. With the mounting im- 
portance of giaduate education and of the gradepoLnt 
average as a ticket of entry, student discretion in their 
academic activities has been severely curtailed. 

The increasingly common flirtation of co'Jeges with 
Pass-Fail grading has its origin primarily in the desire to give 
students wider latitude in their selection of courses. The 
reason most often given for introducing Pass-Fail options 
has been to free students from the constraints imposed by 
fears that courses in unfamiliar areas might damage their 
gradepoint averages. Ironically, the most common objection 
to the consequences of introducing a Pass-Fad option has 
been to students exercising independence in another 
way-in their allocation of study time and effort. 

Prescribing in detail what students must do to earn a 
satisfactory grade takes from them the responsibility for 
deciding what Is important, The importance of the grade- 
point average, which gives force to the instructor's prescrip- 
tions for learning, prevents students from experimenting, 
exploring different approaches, and learning that some 
approaches will not work. But students are also prevented 
from learning that some approaches other than the in- 
structor's may work admirably for them. The present 
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grading system therefore inhibits learning by not permitting 
failure, or by making failure too costly for students to 
experience (Toibert and Hackman, 1969). 

A different kind of distorted learning results when 
students behave in ways unrelated to substantive learning, 
in forms of behavior calculated only to please the instruc- 
tor. Asking the right kind of questions, feigning interest in 
the instructor’s favorite topic, l arning the style of answer 
the instructor prefers, and other purely grade-oriented 
ploys may not be totally useless with respect to substantive 
learning, but their intrinsic value is limited. The grading 
system is said to be the primary cai se of dissipation of 
student effort in this kind of sterile, game-playing activity 
(Axelrod, 1968; Becker, Geer, and Hughes, 1968; Lavin, 
1965; Raimi, 1967; Torbert and Hackman, 1969). 

Bloom (1968) has suggested & third kind of constrain*, 
grades may impose on learning. Wnen teacher arid students 
alike start a course expecting that only a few will learn 
enough to earn a top grade, and ihat some will learn no 
more than enough to get a marginal grade or worse, the 
expectations become sell -fulfilling and reduce the aspira- 
tions and performances of both teachers and students. A 
prior history of earning average grades may put a ceiling on 
student expectations and performance. Bloom contends 
that most students in any particular class are capable of 
achieving the goals of that class. The must effective 
procedure and the time required foi master)’ of a course 
may vary, but instructional procedures should be capable of 
providing for variability in student predilections. 

The present grading structure, in requiring that a 
learning period is to end and grades are to be assigned after 
a fixed period of time, imposes another constraint on 
learning (Fcrdyce and Bromley, 1969-70; Raimi, 1967). 
Successive courses in the same field arc intended to be 
integrated, the first leading into the second and the second 
building on th 3 first. When this occurs, the arbitrary ending 
of a period of learning afier a fixed number of weeks may 
not be serious. But the adequacy of the integration of 
learning that has been structurally fragmented has been 
questioned (Sparks, 1969). The requirement that a student, 
after a fixed period, cither move on to the next learning 
episode or repeat the entire process bc’s just been through 
seems dubious. Requiring sludenti who have naslered a 
course in less than the allotted time to continue to go 
through the exercises cf tint course instead of moving on 
seems equally questionable. 

The grading system h not the only reason for organizing 
learning into fixed periods of time. Some limitations are 
necessary simply because the need for one teacher to 
accommodate a number of students. But present uses of the 
grading system in the selection and classification of 
students require that grades at least hive (he appearance of 
quantitative as well as qualitative comparability. If grades 
of student* who took different courses in the same subject 
are lo be compared, the two courses mus*. have some kind 
of equivalence. Standardizing the time spent in the two 
courses provides that equivalence. The retention c.f grades 
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as selection devices theiefore acts, with other considera- 
tions to inhibit the introduction of greater flexibility into 
the structure of education. 

The lack of substantive meaning in grades (Ericksen, 
1966) is the primary reason for attempts to keep them, in 
some sense, equivalent. Because grades have no content 
other than the name of the course to wliich they are 
attached, grades would have virtually no meaning without 
the comparability, limited as it is, provided by the number 
of weeks of instruction that a grade represents. A grade 
does not indicate what a student knows, for example, of 
the effect of a regulated economy on competitive equili- 
brium, but only that he completed a course in economics 
somewhat more (or less) satisfactorily that most students in 
the course. Since they cannot be compared vith respect to 
the substantive learning they represent, grades in two 
economics courses can be made comparable only in t-*rms 
of the amount of classtime spent in each course. 

If reports of student performance were descriptive, wifii 
respect to the substance and the level of performance, 
strained attempts at equivalence would be unnecessary. 
Persons using grades in decisions about selection would still 
be faced with developing some index of oveiall per- 
formance or suitability from descriptive reports that would 
often not be comparable. But this would not be an added 
burden. It would only repre^nt a shift of that burden from 
those who teach to those who select. And those who select 
would be able to specify their own criteria instead 0 } 
assuming that those used by the teachers were appropriate. 
The lack of comparability that would appear in descriptive 
reports is fully present in current grades and gradepoint 
averages; it ii only hidden by the failure of grades to convey 
any substantive meaning. When no meaning is conveyed 
variation in meaning cannot be observed. 

Becker, Geer, and H.ighes (1963) described grades as 
“the major institutionalized reward available for academic 
work/’ In their view, grades act in college the wty money 
does In society at large, as a medium of exchange for both 
tangible and intangible valuables, but primarily intangible 
ones In the case of grades. Grades therefore constitute a 
major element of the social environment to whxh students 
must accommodate. Their influence is ramified through 
most aspects of student behavior, beyond classroom and 
study activities into such areas as dating behavior rmd 
Informal relationships between students. 

The complete faculty control of the exchange of grades 
for academic performance puts students in » position of 
subjection. Thus one of the commonly stated gills of 
bberal education “training students to be intellectually 
self-directing-is subverted. Yet students retain some 
autonomy end can, through collective action, resist faculty 
demands with some effectiveness (Becker, Geer, and 
Hughes, 1968). 

Wallace (1966) repirfrd ri.vngcs In attitudes toward 
grides over the petkxl cf an aesdenr, year ihat emphasize 
the role of coDcctivc action by student* and the social;?*- 
tion of students through if le fiction with other students 
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Faculty members and nonfreshman students differed sub- 
stantially with respect to the importance attached to 
different orientations toward college. The faculty members 
valued grade-oriented activity more highly than did the 
students. Over the course ot the freshman year, the grade 
orientation of freshmen moved away from tha\ of the 
faculty to a position consistent vrith that of the non- 
freshman students. Conflict between student and faculty 
expectations with respect to grades and the power of the 
socializing effect of the students a both indicated. 

Distortion of teaching 

The need fnr some equivalence in grades prevents 
instructors from varying their course content too far from a 
generally accepted standard for that type of course (Miller. 
1967). Many professors believe an A in a course cannot be 
given without mastery of certain areas of cortent that 
would be agreed on by other professors in the field. 
Whether a particular professor agrees with the presumed 
consensus among his colleagues or not, he may feel his 
reputation endangered if he sends out students with A*s 
who could then be discovered not to have mastered some 
content area. The argument can be made that grades, in 
placing this kind of constraint on professors, are a desirable 
device for maintaining academic standards. But Miller 
(1967) lists this effect of grades among their deficiencies. 
Whichever view is taken, the function of grades in imposing 
instructional constraints is important enough to be 
examined. It has not been. 

The requirement that giSdes be given, in a certain form 
and representing certain presumed accomplishments or 
capabilities, is considered a major cause of 1 sterile but 
common conception of teaching (Axelrod, 1968). An 
inst'.uctor faced with a requirement to order his students at 
the end of a fixed number of weeks with respect to their 
relative accomplishment In his course is inclined to organize 
the course in such a way that grading can be accomplished 
simply and ctn be defended against attack by the students. 
This often Uads to common requirements for all students, 
the vetting of tasks that can be carried out mechanically 
and therefore easily observed, and authoritarian control of 
the activities of the students. 

When instructors are required to assign ratings of merit 
to student* that will affect later decisions about those 
students, they are put hi the role of judge rather than 
mentor. If the two roles we incompatible, as has been 
contended (Axelrod, 1968; Marshall, l968;Miyhew, 1969; 
Riiml, 1967), then current grading practices must interfere 
to some extent with learning- Students have great difficulty 
Ignoring the fact that their teachers will at some point grade 
them. They leave questions unasked rather than risk 
displaying Unoranc*. They ,iiflc critical comments that 
might lead to profitable clashes of t’eas. They stay within 
the instnxtot’s guidelines Instead of stepping outside ihcm 
when an approach that looks intriguing has either already 
been rejected by the instructor or has not occurred to him. 



A larp e number of sometimes subtle but important differ- 
ences can be found between the behavior of someone being 
taught and someone being judged. 

In an analysis c » the development of two new experi- 
mental colleges, Riesman, Gusfield, and Gamson (1970) 
described the effect of grading on faculty behavior and 
relationships much as Becker, Geer, and Hughes (1968) had 
done with giades and student behavior. “Grades serve not 
only to sort and certify students but, more symbolically, to 
sort and certify faculty vis-a vis one another (Riesman, 
Gusfield, and Gamson, 1970, p. 137).” The nature of the 
student-faculty relationship, of *he responsibility of faculty 
for students, was reflected in the grading uehavior of the 
faculty. Interfield conflicts with respect to grading philoso- 
phies developed in which students were able to play one 
faculty point of view against another. Some faculty 
members were put on the defensive, which ones depending 
on the prevailing attitude toward grading and cn the goals 
of the institution as perceived by the faculty at 'irge. The 
role of grades as an implicit affirmation of faculty values 
gives them an importance in faculty relationships net often 
acknowledged. 

Student attitudes ar.d behavior 

One of the common complaints about grades is that they 
produce unnecessary anxiety in students (Benson, 1969; 
Committee cn Educational Policy, 1970; Funkenstein, 
1968; Karlins, 1969; Pass-Fail Study Committee 1969; 
Raimi, 1967). Whether anxiety is desirable or undesirable in 
a learning situation is a complex question, Personal attri- 
butes of the student, the nature of -he learning task, its 
importance to the student, and the level of anxiety induced 
all interact to produce widely varying effects. The only 
statements to be made with reasonable confidence about 
grades and anxiety are that the antteipi’ ion cf being graded 
does rsi'J students* anxiety levels Did that anxiety Is 
usually unpleasant. These two facts probably account for 
students’ overwhelming endorsement of Piss-Fail grading in 
preference to conventional grades. 

The introduction of a competitive atmosphere to 
campuses and classrooms is attributed to grides (Becker, 
Geer, Hughes, and Strauss, 1961; Bloom, 1968; Karlinr, 
1969; Miller, 1967). Its effecti are considered both desir- 
able and undesirable and, lik-t those of anxiety, are 
probably mixed. Those w ho consider competition desirable 
say it provides a valuable motivating force and gives 
students useful experience in handling competitive situa* 
(ions. Others say it interferes with learning by inhibiting 
student cooperation and collaboration, by adversely affect- 
ing students* peer relationships, and by lowering student 
morale. 

Cheating is said to be a consequence of grades (Birney, 
1964; Raimi, 1957) and may be ore reflection of an 
atmosphere of competition. One of the contentions of 
proponents of Pass-Fail grading is that cheating is less 




prevalent with that system than with conventional grades 
(Committee on Educational Policy, 1970; Stallings and 
Leslie, 197Uj. So far as is known, however, systematic 
observations of the relationship between grading and 
cheating lnve not been made. 

Students’ decisions about graduate study were hypothe- 
sised by James A. Davis (1966) to be affected by their 
grades as undergraduates, relatively low grades acting to 
discourage students from applying lo graduate school. 
According to Davis’s theory, the selectivity of the under- 
graduate college would not be given much consideration by 
the students. Average students at selective colleges would 
then give up graduate rehex* aspirations even though they 
may be superior to top students at mediocre schools who 
had their graduate school aspirations strengthened by an 
undergraduate performance that was high only in relation 
to a mediocre standard. Davis presented evidence from a 
large-scale survey of college graduates that partially sup- 
ported his view. Werts and Watley (1968, 1969) provided 
some confirmatory evidence for the theory', although Werts 
(1968) raised questions e.bout the adequacy of the analysis 
for the purpose. Davis pointed out that the effect of the 
process, if it occurs, would be to ensure the presence of 
capable people-the mediocre students at the exittlHnt 
colleges-in occupations of relatively low- pri vt^e, such as 
teaching. 

Social effects of grades 

A largely unexamined but highly important aspect of 
grades is their effect on ihe social structure. The view that 
grades are a mechanism by which education maintains the 
existing class structure, controlling access to higher social 
and economic levels, has been discussed earlier. Major 
proponents of Dus view are Caplow (1954), Katz (1968), 
Jer.cks and Riesman (1963), ind Sexton (1967), bul others 
who have raised questions about the socially conservative 
effects of grades are Eikksen (1967), Hoyt (1966), Larin 
(1965), and Tyler (1969). Clark (1962) considered the 
socially constraining effects o f education to have been 
reversed In the present century is cduca'.ion became more 
widely available. He developed the widely held position 
that education acf i as a mechanism for upward social 
mobility and for eordering social portions ii accordance 
with individual merit rather lha f i soda] origin. Whichever 
view of the social effects of education h more accurate, 
grades are an important mechanism for producing those 
effect. 

Few of the possible effects of the grading system an as 
important as its role in either maintaining or reordering 
social ind economic positions. This alone should justify far 
more intensive study of the grading process than has been 
carried out. Most of the evidence on the cfP^s of grading 
consists of student reports of ftciingxor altitudes. Students 
say they feel anxious about grades, but the level, effects, 
and precise source of the anxiety art unknown. Some 



students and faculty members say that grades interfere with 
learning, supporting their statement with plausible argu- 
ments but few pieces of evidence. That an educational 



practice as important, as pervasive, and as much the subject 
of contradictory views as grading should have had so little 
systematic investigation is startling. 



VI. TECHNICAL ISSUES IN GRADING 



Most of the preceding discussion of the forms, purposes, 
and effects of grades has been concerned with issues 
external to grades themselves. Yet the intrinsic characteris- 
tics of grades-the processes through which academic 
performance is judged, the ways those judgments are 
translated into scaled symbols, and the composition and 
stability of both the judgments and their translations into 
symbols-to a large extent determine how well grades 
perform their external functions. 

Multiple components versus 
a single dimension of performance 

Academic performance can be considered the result of 
some amalgam of inherent intellectual capability, posses- 
sion of relevant information, intellectual curiosity, percep- 
tiveness, analytical power, ability to syrthesi/e concepts 
into higher order abstractions, clarity of exposition *nd 
expression, and other intellectual capabilities. Attitudes and 
behavioral tendencies add more elements to academic 
performance. Industriousness, coinmitinent to an academic 
field, responsiveness to instruction, intellectual integrity, 
and some other at'.ribu.es of personality are difficult to 
distinguish from variables that are more explicitly aca* 
dernic. Finally, most professors responl favorably to some 
student attribute*, such as physical attractiveness, pleasant- 
ness of manner, or apparent earnestness, that are irrelevant 
to academic performance but that sometime* color judg- 
ment of performance. What peculiar combination of these 
and ether variables is reflected in an instructor’s evaluation 
of student academic performance is never completely clear, 
even when a course grade is determined entirely by the 
mechanical accumulation of points on a set of examina- 
tions. Different kinds of student performance reflected in 
tests given at different points in a course, for example, 
might add to identical totals ani identical grades for two 
students who diffe*ed sharpl) in the nature of their 
performances. Although different kinds tf performance 
may be equivalent with respect to overall level, that 
detr iminalion is seldom made and its Implications seldcm 
explored, How much expository fluency is equivalent to 
how much analytical skill is the kind of question too lightly 
passed over in determining grades. 

The multifaceted nature of academic performance has 
been offered frequently as a major problem in (he 
interpretation of grades (eg., Et>el, 196$; Milton, 1966; 
Milton, 196$: Trow, 196$). Evidence that grides are 
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determined by various kinds of behavior is not hard to find. 
Faculty members vary in the weight they give to such 
aspects of performance as effort and improvement 
(Axelrod, 1964). Medical school grades have been shown to 
be multidimensional (Haley and Lerner, 1967). English 
professors differ in the qualities they observe in assigning 
grades (Lewis and Smith, 1969). Faculty members in the 
physical sciences differ from those in the rocial sciences in 
the expectations they hold for students (Garrison, 1967; 
Riesman, Gusfield, and Gamson, 1970), The diversity of 
academic performance seems incontestable. 

When a set of grades, each determined by a somewhat 
different set of attributes, is averaged, the qualities repre- 
sented by that average can only be guessed at. The 
argument has been made that averaging course grades is 
desirable because it compensates for the variable nature and 
uncertain assessment of the student attributes, capabilities, 
and performances that determine individual grades (Bramer, 
1970; Dale, 1969). Deviations from the average of the 
judgments of 20 to 40 instructor; are said to cancel 
themselves out, leaving a reasonably stable indicator of 
whatever is common to most faculty evaluations. But the 
nature of that common core is hard to identify. 

Bold t (1970) has recently provided empirical support for 
the existence of a single dimension underlying performance 
in a number of courses. At two different graduate schools 
of business, variation in student performance across 31 and 
70 different courses could be accounted for almost as well 
by one dimension of performance as by two or three. Even 
though some courses were quantitatively oriented while 
others were heavily verbal, performance in those two types 
of courses ould not be dearly differentiated. Bold t 
•concluded that “the present study uncovers no reason to 
reject gridepoint average as a simple and descriptive index 
of achievement (p. 23)“ 

In spite of the study’s limitation to graduate courses in 
business, in which about 90 percent of the grades went 
either A or B, substantial support is given the view that 
giadepoint averages represent quite well some composite of 
whatever kinds of academic performance are reflected by 
grades. But the nature of that composite dimension- the 
components of student performance that it combiner into a 
single measure- re maim undefined, Its usefulness beyond 
predicting second-year graJes from first-year grades would 
be enhanced if its components and their interrelationships 
were known. 

Another study related to gliding in graduate schools of 
business suggests that acid* ink achievement may be treated 



either unidimejisionaUv or multidimensional]/, depending 
on the situation. A set c f rating scales was developed that 
described 13 q ralities faculty members in graduate schools 
considered desirable in thdr graduates. A total of 191 
first-year students at two schools were rated on the 13 
qualities by 27 different faculty members, with each 
faculty member rating up to 10 students (Hilton, Kendall, 
and Sprecher, 1970) 

Five of the 13 attributes- perspective and breadth of 
knowledge, technical knowledge, critical awareness, prob- 
lem analysis ability, and communication skill-can be 
coi sidered modifiable by instruction. The other eight, such 
as persistence, initiative, and flexibility, seem less accessible 
to nstructional change but may nevertheless affect judg- 
mei ts about student performance. These 13 desirable 
attr butes, but particularly the five subject to change ui.der 
instiuction, might be expected to vary in importance across 
coui >es and appear as distinct dimensions in studies such as 
BoldcY But the ratings of students on the five modifiable 
attributes were all highly interrelated and were all mode- 
rately related to first-semester gradepoint averages. 

Fo' some purposes, these qualities may be treated as 
distinct attributes, but they can also be considered as 
somewhat different component of a single dimension of 
academic performance. The choice between considering 
performance a single dimension or several should depend on 
its use. Present praclice is to treat academic performance as 
a single, global entity. Greater krowlcdge about its com- 
ponents and theh relationships 'o o T her kinds of per- 
formance should lead to belter student evaluation, better 
grading, and more effective ust of gra fts. 

Reliability or consfc o grades 

Ihe reliability of grades can be observed in several ways, 
each involving »cme aspect »/f consistency. The internal 
consistency of grades is a measure of the degree to which 
the various observations maf.e by a particular instructor to 
arrive at judgments about tnc grades of students in one o f 
his courses relied a common form of academic per- 
formance. For example, if the instructor’s evaluations, 
whether of written prpers, objective examinations, or 
observations of classroom performance, all depend heavily 
on the recall of faetjal material, his grades are likely to 
show r high index >f internal consistency. Course grades 
can also be consi jte nt — tJiat is, show the san>e relative 
ordering of studcnts-across Instructors teaching the same 
couise, across different courses taught by the same in- 
structor, acmss different classes taught by the same 
instmelor ir. ihe seme course, and across time. Most of 
these situ'rions, such as different insinictors teaching the 
aanie course to the same student, exist only hypothetically, 
but they illustrale the varied meaning of reliability. 
Cor^stency or reliability in any of these other forms is 
lirvted by the internal consistency of the grides of 
ind vjdual instructors, 
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The reliability of grades is clearly related to their 
dimensionality. As the number of attributes considered in 
assigning grades increases, *he reliability is likely to de- 
crease. Reliability can stay moderately high, however, if (he 
various attributes observed by an instructoi are themselves 
highly related. The high reliability of grades across courses 
an<* instructors, for example, in spite of differences in 
course emphases and methods of evaluation, is probably 
due largely to Ihe common element of verbal ability in 
most academic evaluations 

Two studies in recent years have measured the reliability 
of grades, using different procedures but with siaiilar 
results. Clark (1964) defined reliability as the ratio of the 
variance of individual gradepoint averages to the total 
variance of all grades. For 1 8 classes of freshman women at 
Northwestern University from 1931 to 1959, with an 
average of almost 300 women in each class, reliability 
coefficients ranged from .70 to .80 with a median of 74. 
Barritt (1960, Ufing a simpler but somewhat analogous 
computational procedure that consisted of computing 
correlation coefficients between random halves of students’ 
grades, found freshman grades for 237 students at lndiarj 
University to have a reliability of .34. 

Both these studies show that a large part of the 
information in freshman grades car be associated with some 
uni J imensional concept. They do not rule out the useful- 
ness of a mo.e complex conceptualization of grades. They 
do indicate, with Boldt (1970), that treating grades as 
though they represent a single, general kind of academic 
performance is a sound procedure. 

The common assertions about tho unreliability of grades 
usually refer to the inconsistency across faculty members 
with respect to their judgments about the quality of a 
particular piece of student work, usually a written paper or 
essay examination (c.g., Stewart-Tull, 1970). This source of 
inconsistency may be due either to variations across 
instructors in the attributes they consider important or to 
inability faculty members to make consistent judgments, 

TTi3 temporal stability of grades can be affected by 
inconsistency of any sort. In a study of grades at the 
University of Illinois between 1962 and 1966, correlations 
belween grades in adjacent semesters were moderately high, 
around .55 to .60 (Humphreys, 1968). The correlations of 
first -semester grades wiih high school rank and an achieve- 
ment test were both about .50. But the correlations of the 
same predictors with grades In each succeeding semester 
declined regularly and diamatieally. High school rank 
showed a correlation of .22, for example, with grades In the 
eighth semester of college. Similarly, the correlations 
between grades in different semesters declined regularly 
from about .54 to .34 for a constant sample of students as 
the rime between semesters increased. A virtually constant 
standard deviation through all eight semesters dispenses 
with the possibility that reduced variability in the later 
semesters could account for the declining correlatians. 

While grades are consistent across courses in any one 
semester, they are not very stable over an extended period 
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of time. Either the nature of the academic performance 
that grades reflect is modified as students progress through 
college, or else students fluctuate substantially from year to 
year in their performance. In either case, predictions of 
performance beyond the next academic year are dubious. 
AJso in cither case, knowledge of the nature and behavior 
of the components of academic performance would be 
valuable. 

Adjusted grade distributions 

Faculty members tend to be concerned about the 
problems of evaluating students and arriving at grades that 
will summarize performance on the enables considered 
important in a particular course. How a student performs in 
other courses and the nature cf the performances con* 
sidered important in other courses are clearly irrelevant. 
Variation across courses in the nature of student per- 
formance is accepted as inevitable and proper and attracts 
no one’s concern. Variation across courses in the level of 
performance judged to have been reached by the students 
does attract attention and is considered undesirable (Juola, 
196$), 

The concern over variation across courses in the average 
level of student performance aid the acceptance of 
variation across courses in the nature of student perform- 
ance aie not necessarily inconsistent. The desiie for 
students in dissimilar courses to have similar distributions 
of grades is no more than a desire for the grade scales to be 
comparable in different divisions, departments, and courses. 
Only if that is the case can the grades of all students in all 
courses be considered equivalent and capable of being 
summarized in a grade poim average. 

The legitimacy of giaiepolnt averages, in contrast to 
individual course grades, is the concern of deans, of faculty 
member whets they are serving on admission committees, 
and of directors of institutional research. That concern is 
reflected in tl.e suggcs;»on thM the grade distributions 
within any c'a^s Pe adjusted to tho capabilities of the 
students in that particular class, as indicated by academic 
aptitude test scores or previous grades or both 

(Andcrlulter, 1962; Berdie, 196S; Frici.e 1965; Grant, 
^ 1956). A da.-i composed predominantly of A and B 
students would receive a predominance of A’s and BY A 
class that included a wide range of capabilities would 
receive a wide distribution of grades. Grades would there- 
fore be appro umateiy equivalent across a 1 ciisses. 

Computers make it possible for each instructor to 
receive, soon after the start of a term, a report of the 
distributions of the previous grades and test scores of the 
students in each of his classes, lie rv.ed not have that 
information about individual student* and is under r>o 
obligation to assign any particular grade to any individual. 
But he would know the general level and range of 
performance to be expected in each class and could adjust 
the eventual grade distribution of each class accordingly. 
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As Fricke (1965) pointed out, adjusting grade distribu- 
tions to student capabilities is an effective resolution of a 
problem of grading that is distinct from problems of 
evaluation. Judging the nature and quality of students’ 
performances is evaluation ana results in a ranking of 
students within the group evaluated. Then the determi- 
nation of where to place those stuients on some grade 
scale— how many, if any, of the top students merit A’s, how- 
far down the scale the lowest student falls, and where the 
students in between beiong-is a new- problem: grading. 
Adjusting grades seems to assure comparability of the grade 
scale across departments and classes differing in student 
capability by anchoring the grades in any class to scores on 
a common aptitude test taken by most of the students or to 
the averages of the students’ previous grades, But its 
independence of the evaluation process accounts for at least 
two deficiencies in the procedure, and others exist as well, 

Fits!, a common scale car, be used in the assessment of 
dissimilar objects only if the objects possess some common 
attribute. No manipulation of numerical scales can make 
inherently different concepts equivalent in any very useful 
sense. Fhe academic tasks in chemistry cla.-scs are different 
from those in literature classes. Adjusting chemistry grades 
and literature grades in accordance with a common anchor- 
ing variable is justifiable only to the extent that the 
anchoring variable is associated with both chemistry and 
liierature. Distinctions between performance in chemistry 
find literature will be systematically dz-emphasized, even 
though the areas that distinguish between the two fields 
may be those most worth emphasizing. Any other source of 
variation in the meaning of grades, as well as differences 
across fields, such as basing some grades on the recall of 
factual material and othcisor; comprehension of complex 
relationships, fuither detracts from the comparative mean- 
ing of grades it a way that cannot be remedied by 
adjustments to the grade scale. 

A second objection was raised by Gold (1966), who 
pointed out that academic performance is a consequence of 
the activities of teachers as well as students. Under Fricke’s 
proposal, the relative achievement of tw r o comparable 
classes, one brought to a high level of performance by an 
outstanding instructor and the other left at a relatively low 
level of performance by a poor instructor, would be 
Indistinguishable. Appropriate evaluation would reveal the 
differences in performance -grading adjustments would 
hide them. 

Two other objections can be made to adjusted grades, 
One is that ’he process is biased against recognition of 
change in performance. Even though an individual student's 
grade is free in take any level, the number of A’s in a class 
is constrained by the previous performance of the class as a 
whole, and a shaip growlh in general student interest and 
performance will not be recognized in the grades students 
receive. If grades have a motivating effect, as many 
contend, that effect might be curtailed or even reversed; the 
lack of recognition of improved performance could dis- 
courage furlbe f improvement. Although this potential 
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damping effect of grade adjustments on changes in perform- 
ance might be negligible, no evidence is available cn which 
to base such a judgment. 

The final objection is that grade adjusting requires the 
assumption that criteria of performance are constant across 
courses and freni one year to the next. If grades in a 
particular course are intended to reflect an understanding 
of complex interrelationships in the flow of history, for 
t cample, adjusting them to previous grades that indicate a 
variety of kinds of performance, including things Idee 
remembering taxonomies or applying rules of integration in 
calculus, makes lithe sense. Adjusting grad* distributions to 
previous gradepoint averages gives disproportionate weight 
to these few elements of academic performance, whatever 
they may be. that are common across all courses. 

Interactive deterrnLnfnts of grades 

Other writers, a* well a? Go^d (1966), have pointed out 
the complex, interactive nature of the determinants of 
grades. Ericksen (1966) described grades as the result of 
“an extremely complicated interaction between a teucher, 
students and a body of knowledge.” Haagen (1964) added 
the effects of the institutional climate and of society at 
large, but also stressed their interaction with student and 
instructor characteristics. Variations in faculty standards 
(Axelrod, 1964; Juoh, 1966; Kirby, 1962; Trow, 1968; 
Webb, 1959), In departmental standards (Aik'm, 1964; 
Anderhalter, 1962; Garrison, 1967; Juola, 1968; Kelly and 
Thompson, 1968; Pemberton, 1969; Trow, 1968), and in 
average student capabilities from year to year (Aiken, 1963; 
Bowers, 1967; HiUs and Gladney, 1968; Miller, 1969; 
Webb, 1959) are ali influences on gjades that are beyond 
the student’s direct control. If grading is to be free from 
effects not under the student’s control, some approach to 
an absolute standard i* necessary. 

Absolute versus rehtive standards 

Inequities in relative grading standards, due to any of the 
sources of variation beyond the student's control, may b< 
avoided by establishing absolute standards and making each 
student's giade independent of any other student’s grade. 
Although relative grading standards and grading "on the 
cuive" have been dominant over absolute standards for half 
a century, a resurgence of interest in absolute grading is 
occurring in the guise of criterion-referenced testing (Ebct, 
1962; Richards 1970). Work in programmed learning 
requires the determination of absolute level* of perform- 
ance to direct the learner to the next stage of instruction. 
Individualized instruction in any of its varied forms 
similarly requires absolute scales of performance. 

One consequence of an absolute grading standard b the 
opportunity to avoid the fixed time period of a semester or 
quarter in evaluating ichfc v en>ent. If, as is generally 
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acknowledged, students learn at different rates, permitting 
students to use varying amounts of tine to reach a desired 
level of achievement seems preferable to the present system 
of applying the same evaluative standards to all students at 
the end of a predetermined number of weeks (Bloom, 
1968; Dressel and Nelson. 1961). Evaluation and certifica- 
tion of achievement at the end of variable periods of time 
require the development of absolute standards. 

Yet the argument over relative or absolute grading 
standards is to some extent a false issue. Even in the British 
system of external examiners and in criterion-referenced 
testing, the "absolute” standard is established in relation to 
some expectation of performance based on past experience 
with examinees in similar circumstances. Th^ real issue is in 
specifying the source of the standard on which grades are to 
be based. Neither a narrowly defined relative standard that 
results in a fixed distribution of grades throughout a class 
regardless of Me general level of the class’s performance, 
nor a rigid standard based on scores on a standardised, 
externally administered test, seems desirable. But the 
decision as to what standard should be applied must be 
reached with some care, and that decision cannot reason- 
ably be reached without consideration of the purposes for 
which grades are to be used. 

External versus internal evaluations 

Consideration of absolute standard", suggests placing the 
responsibility for student evaluation and grading in an 
agency different from the agency providing the instruction, 
as is donp regularly in England and sporadically in the 
United Siates. External evaluating agencies almost in- 
\nrinbly are concerned with summarise rather than forma- 
tive evaluation. This distinction is important partly as a way 
to emphasize the point that giving the task of summative 
evaluation to an external agency removes neither the 
opportunity nor the responsibility for evaluation from the 
teacher. Formative evaluation, which is the form most 
closely lied to the instructional process, remains a major 
responsibility of the teacher even when summative evalua- 
tion occurs externally. 

When summative evaluation, with which grading is 
usually associated, is performed by an external agency, the 
competence of the students examined is cetitfied according 
to some generally accepted ..andnrd. To the extent that 
grades are used outside the instructional institu'ion, as in 
selection of graduates by other institutions or cn players, 
the certification of an external agency might well ,tp!aee 
grades (Goodman. 1964; Jepcks and Riesman, 1968). 

Placing th; process ot sunvnalive evaluation n an 
external agency docs not necessarily remove grading en- 
tirely from the instructional institution. Just as other 
agencies may use Me summative evaluations in selection, 
the instructional institution can use those evaluations for 
whatever Internal purposes grades serve. These might 
include advancing students to Ivgh ‘r-leve/ courses, awarding 
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honors, encouraging promising students, or determining 
eligibility for extracurricular activities. For some purposes 
the external cviluations might be translated into grades 
within each institution, or within departments in an 
institution. This translation of external evaluations into 
internal grades might take account of the multitude of 
variables other than strict academic achievement that now 
enter the determination of grades -variables such as 
whether the course is part of the student’s major field, the 
student’s industriousress and attitudes toward the course, 
and the relative performance of other students in the 
course. These giades, with all their vaiiatk n across situa- 
tions, would then be distinct from those intended only to 



indicate academic accomplishment and could be tailored to 
the specific purposes desired. 

A major objection to external evaluation is its total 
dependence on limited observations of student performance 
conducted over a brief period of time. A student's 
instructors can almost certainly provide judgments about 
his capabilities that would ml be duplicated by an 
examination, either internal or external. Although ad- 
vantages and disadvantages can be found in each procedure, 
the choice between an external or internal examining 
agency depends heavily on the purpose of the examination 
and partly on the mundane issue of who should bear its 
cost. 



VH. POSSIBLE DIRECTIONS FOR COLLEGE GRADING 



Although college grading is currently the subject of 
widespread controversy, the points in hottest dispute are 
not fundamental issues. The liveliest issues today are how 
many grade categories to use, how to predict grades more 
accurately, and to a lesser extent, how to make grades 
comparable across courses, departments, and institutions. 

The following issues that seem properly to merit prior 
consideration have been raised but not pursued. How well 
do giades serve the purpose: for which they are intended? 
Do those purposes merit the enormous expenditures of 
rime and energy grading entails? Would alternative ways of 
accomplishing the same purposes be preferable to current 
grading procedures? What are the unintended consequences 
of current grading practices, both for society at large and 
within the educational process? These issues involve exter- 
nal effects of the grading system. 

Problems within the grading system influence externa! 
issues but can be pursued independently of them. The 
primary internal issue is that raised by Westland (1969). 
What do grades represent? The reasonably good internal 
consistency and short-term reliability that have been 
demonstrated indicate that g/ades in general, across varied 
courses and instructors, do . .‘fleet some common attribute. 
But that attribute can be called academic achievement, 
directed knowledge, verbal proficiency, academic facility, 
intellectual servility, or whatever is most commonly found 
to please pnfessors. It :$ probably some complex entity in 
which several independent attributes are merged, as mass 
and volume aie merged in density ev as the height and 
weight of persons are merged in size. The most fruitful 
expenditure of effort w ith respect to l hi structure of grades 
wojld be that directed toward identifying the various 
components that underlie grades and assessing their inter- 
relationships and fluctuations across fields, types of 
courses, professors, and nuden's. 

Knowledge of the various determinants of giades would 
then facilitate the study ol external problems, such as 



improving the selection and feedback processes. Grades 
could be made to reflect directly their various underlying 
dimensions, and selection procedures could be varied to suit 
the purposes of the selecting institution. Equivalence would 
no longer need to be forced onto inherently different 
measures. Prediction could probably be improved. Instruc- 
tional goals could be more carefully defined and instruc- 
tional effectiveness more adequately assessed. 

A procedure such as that suggested by Elbow (1969) 
would provide the advantages of descriptive grading but 
could be carried out without many of the inconveniences 
pre^nted by unsystematized prose statements in reporting 
achievement. A r *udy of the current processes of student 
e\aluation at an institution could reveal the most common 
dimensions of student performance that faculty members 
consider important in their own courses. An institution’s 
evaluation of its students might be desirably broadened by 
including dimensions of performance found important in 
studies at other institutions, such as those by Hilton, 
Kendall, and Sprechei (1970) or Junius Davis {1964, 1965, 
1966). Simple rating -calcs based on the dcrired dimensions 
could then constitute the basic achie'ement report. As 
Elbow suggests, faculty number,; cou!J choose those 
dimen, uo*i they consider appropriate to describe a 
student’s performance, remaining free, « at present, to 
determine the evaluation procedures or which their ratings 
would b< based. Students might, as Feldinesser (1969) and 
Wclfie (1968) suggest, be involved n decisions as to which 
dimensions to include in the evaluaticns, end these could 
vary with different students. Avcrsging would occur only 
with respect to common dimensions and much of the 
richness of descriptive gradr^ could be achieved without its 
administrative inconveniences, 

An apparent deficiency in differentiated grading may he 
the tendency for a student’s excellence in one area to color 
faculty judgments of his performance in other areas- the 
“haJo effect.” This tendency is equally present in any form 



of grading bui is more apparent when grading is differen- 
tiated into several dimensions. On the other hand, the 
explicitness of the various dimensions of performance may 
reduce the “halo effect’' by making faculty members more 
aware of distinctions in types of performance. 

One of the most important consequences of greater 
knowledge of grade components would be an increased 
hkelihood of demonstrating connections between academic 
performance and behavior outside the academic setting. 
While academic growth, as represented by advancement 
through collegial institutions, may be defended as inher- 
ently desirable, it would gain public support, recognition, 
and understanding if its importance to nonacademic enter- 
prises could be shown more convincingly. In a period of 
increasing calls for accountability in higher education, few 
issues seem more pertinent than a demonstration of what is 
meant by, implied in, or associated with the kind of 
academic growth that colleges claim good grades represent. 

A major deficiency in current grading procedures is their 
broad uniformity in spite of the variety of functions 
demanded of them. An obvious direction for improvement 
would be to vary the form to suit the purpose. Some form 
of differentiated grading at midterm, for example, would 
probably serve the feedback function of grades far better 
than do present procedures. Whether these grades should be 
retained in a student’s central record, only in the instruc- 
tor’s records, or not at all w ould depend on how well they 
were suited to purposes other than feedback to students 
and on the availability of other methods to accomplish 
those purposes. 

For man) 1 of the administrative purposes within the 
in.tUution, grades seem unnecessary other than as an 
indication that a student has completed a course satisfac- 
torily, Eligibility for various activities, or for considerations 
suci as veterans’ benefits and other financial assistance, 
seems to justify no requirements other than bona fide 
status is a student. Whether one student is more or less 
capable than another has no obvious relevance to ad/ninis- 
trative considerations associated with status as a student. 
Pcss/No Record grading would effectively serve this kind of 
administrative purpose. 

Selection within the institution for academic awards, 
honor programs, or special chsses could well be based on 
faculty nominations supported by evaluative information 
provided by the faculty. If this practice were followed, 
faculty members might retain their own records of differ- 
entiated grading reports to students. From these, informed 
nominations could readily be made. The nature of there 
purposes makes detailed information on all students un- 
necessary. 

Evaluations of the effectiveness of different progiams or 
departments or of new’ instructional procedures depend on 
sutnmalive evaluations of student j>erformancc. These could 
be carried out internally or by an external agency and could 
take a variety of forms, Including comprehensive exirvcna- 
lions, evaluation of student products, or evaluation of 
student portfolios accumulated during a co irse. A m/q >r 
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consideration in evaluations for these purposes is that all 
students need not be evaluated. The teaching effectiveness 
of any department can be adequately assessed by testing a 
sample of its students and testing individual students in the 
sample only partially. 

If 300 students have taken a t vo-semester course in 
economics, for example, the effectiveness of that course 
sequence could be evaluated by having six groups of 20 
students each take a 1-hour segment of a 6-hour compre- 
hensive examination. The unexamined 180 students could 
be assigned to comprehensive examinations in other 
courses. Since the examinations w’ould be used to assess 
courses and instruction rather than students, most of the 
anxiety associated with examinations should be avoided, at 
least in the students if not in the instructors Assignment of 
students to examinations could b? done randomly just, 
before the exams are given so students w'ould not study 
disproportionately in Hie area in which they were to be 
examined. 

Certification of students’ accomplish! me nts in certain 
areas, when a more elaborate indication than satisfactory 
completion of a specified ser of courses is desired, could be 
accomplished by some fomi of surrunative evaluai on in 
which individual students were examined in all pertinent 
areas. Again, these could be i nternal or external examina- 
tions. Whether the instructional institution provided evalua- 
tive information for other agencies, such as graduate and 
professional schools, could be determined by each institu- 
tion. A useful procedure night be fo. undergraduate and 
graduate institutions to plan jointly for a form of summ?.- 
tive evaluation that would serve both institutions. Students 
uninterested in advanced education need not be subjected 
to that evaluative procedure. 

A number of faculty members at the Ihivemty of 
California at Berkeley favor the use of summative evalua- 
tion at the end of a nulticourse sequence (Educatim at 
bi f keley, 1968). Grades would not be assigned in the oarlv 
courses in the sequence, but performance in these courses 
w’ould be reflected in the s.ipei grade at the end of the 
sequence. Grading, in tJieii view, would be improved by 
iedoebig iis fuqn .icy and increasing its comprehen- 
siveness. Raimi *’1967) made a similar proposal but sug- 
gested that th< periodic comprehensive examinations not be 
tied to any particular seouence of courses. These pro- 
cedures are sound in terms of summalive evaluation and its 
purposes. Formative evaluation would, as is almost always 
the case, be another nmtte r . 

Ihe purposes grades serve need clearer identification and 
more intensive examination to justify the expenditure of 
resources fo. their accomplishment and to determine the 
most effective ways they can be accomplished. The current 
outmoded and large ly i.ieffective grading procedure: should 
be replaced with procedures more specifically directed to 
then intended purposes. More varied and more effective 
procedures sre available. That they have been used so little 
may be due to uncertainty or confusion over what is really 
wanted of grades. 
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In summary, we don’t know what present grades 
represent as indexes of academic perfo-mance. The current 
issues surrounding grades and grari'ig 'annot be effectively 
faced until we do, Wien the corner -Us and structure of 



grades are better described, we will be able Vo attack not 
only the current, rather limited issues, but the more 
substantial ones that bear heavily on the entire higher 
educational enterprise. 
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